2 days ago
🇺🇸 United States – Remote
💵 $163.7k - $246.1k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
• Lead the adoption and implementation of SRE practices across the organization, promoting a culture of reliability and continuous improvement. • Develop and implement automation tools and frameworks to enhance system reliability and operational efficiency. • Design and maintain comprehensive monitoring and alerting systems to ensure the health and performance of applications and infrastructure. • Lead the response to high-severity incidents, conduct root cause analysis, and implement corrective actions to prevent recurrence. • Analyze system performance and reliability data to identify areas for improvement and implement optimization strategies. • Work closely with development, operations, and product teams to ensure seamless integration of SRE practices and to drive reliability improvements. • Mentor and train junior engineers in SRE best practices, develop a culture of knowledge sharing and continuous learning. • Conduct capacity planning and demand forecasting to ensure systems can handle future growth and spikes. • Maintain detailed documentation of SRE processes, tools, and best practices to ensure knowledge continuity and operational excellence.
• Experience with observability tools such as Datadog, Prometheus, Dynatrace, Grafana, ELK Stack, or similar. • Proficiency in programming languages such as Python, Go, or Java. • Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Docker , Kubernetes). • In-Depth knowledge of AWS services including VPC, Lambda, IAM, ELB, EC2, ECS, CloudWatch, API Gateway, S3, SQS, SNS, WAF and Route53. • Experience with infrastructure as code tools such as Terraform, Ansible, or similar. • Excellent troubleshooting and problem-solving skills. • Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams. • Experience leading and mentoring engineering teams is highly desirable. • Knowledge of security best practices and experience implementing security controls and measures. • Experience with chaos engineering and resilience testing. • Familiarity with AI/ML applications in operational processes. • Knowledge of security best practices and compliance requirements.
Apply Now2 days ago
11 - 50
Senior DevOps Engineer for telecom services company working fully remote.
2 days ago
51 - 200
Join ThreatConnect's DevOps team to innovate on security operations solutions.
2 days ago
1001 - 5000
Join WorkWave as a DevOps Engineer focusing on cloud infrastructure solutions.
🇺🇸 United States – Remote
💵 $95k - $140k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
3 days ago
11 - 50
Support genomic surveillance and outbreak investigations as Bioinformatics DevOps Architect.
3 days ago
501 - 1000
Build and run distributed systems for Fetch’s rewards platform.
🇺🇸 United States – Remote
💰 Debt Financing on 2022-04
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor