Senior Principal Site Reliability Engineer, SRE

October 29

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $163.7k - $246.1k / year

⏰ Full Time

🟠 Senior

β›‘ DevOps & Site Reliability Engineer (SRE)

Apply Now

Description

β€’ Lead the adoption and implementation of SRE practices across the organization, promoting a culture of reliability and continuous improvement. β€’ Develop and implement automation tools and frameworks to enhance system reliability and operational efficiency. β€’ Design and maintain comprehensive monitoring and alerting systems to ensure the health and performance of applications and infrastructure. β€’ Lead the response to high-severity incidents, conduct root cause analysis, and implement corrective actions to prevent recurrence. β€’ Analyze system performance and reliability data to identify areas for improvement and implement optimization strategies. β€’ Work closely with development, operations, and product teams to ensure seamless integration of SRE practices and to drive reliability improvements. β€’ Mentor and train junior engineers in SRE best practices, develop a culture of knowledge sharing and continuous learning. β€’ Conduct capacity planning and demand forecasting to ensure systems can handle future growth and spikes. β€’ Maintain detailed documentation of SRE processes, tools, and best practices to ensure knowledge continuity and operational excellence.

Requirements

β€’ Experience with observability tools such as Datadog, Prometheus, Dynatrace, Grafana, ELK Stack, or similar. β€’ Proficiency in programming languages such as Python, Go, or Java. β€’ Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Docker , Kubernetes). β€’ In-Depth knowledge of AWS services including VPC, Lambda, IAM, ELB, EC2, ECS, CloudWatch, API Gateway, S3, SQS, SNS, WAF and Route53. β€’ Experience with infrastructure as code tools such as Terraform, Ansible, or similar. β€’ Excellent troubleshooting and problem-solving skills. β€’ Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams. β€’ Experience leading and mentoring engineering teams is highly desirable. β€’ Knowledge of security best practices and experience implementing security controls and measures. β€’ Experience with chaos engineering and resilience testing. β€’ Familiarity with AI/ML applications in operational processes. β€’ Knowledge of security best practices and compliance requirements.

Apply Now

Similar Jobs

October 26

Hone Health

11 - 50

Shape technology strategy at a pioneering telehealth startup.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $120k - $140k / year

⏰ Full Time

🟠 Senior

β›‘ DevOps & Site Reliability Engineer (SRE)

October 26

Senior Site Reliability Engineer at Invoca, focused on platform reliability and observability.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $127k - $159.9k / year

⏰ Full Time

🟠 Senior

β›‘ DevOps & Site Reliability Engineer (SRE)

πŸ¦… H1B Visa Sponsor

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com