Senior Principal Site Reliability Engineer, SRE

2 days ago

🇺🇸 United States – Remote

💵 $163.7k - $246.1k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now

Description

• Lead the adoption and implementation of SRE practices across the organization, promoting a culture of reliability and continuous improvement. • Develop and implement automation tools and frameworks to enhance system reliability and operational efficiency. • Design and maintain comprehensive monitoring and alerting systems to ensure the health and performance of applications and infrastructure. • Lead the response to high-severity incidents, conduct root cause analysis, and implement corrective actions to prevent recurrence. • Analyze system performance and reliability data to identify areas for improvement and implement optimization strategies. • Work closely with development, operations, and product teams to ensure seamless integration of SRE practices and to drive reliability improvements. • Mentor and train junior engineers in SRE best practices, develop a culture of knowledge sharing and continuous learning. • Conduct capacity planning and demand forecasting to ensure systems can handle future growth and spikes. • Maintain detailed documentation of SRE processes, tools, and best practices to ensure knowledge continuity and operational excellence.

Requirements

• Experience with observability tools such as Datadog, Prometheus, Dynatrace, Grafana, ELK Stack, or similar. • Proficiency in programming languages such as Python, Go, or Java. • Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Docker , Kubernetes). • In-Depth knowledge of AWS services including VPC, Lambda, IAM, ELB, EC2, ECS, CloudWatch, API Gateway, S3, SQS, SNS, WAF and Route53. • Experience with infrastructure as code tools such as Terraform, Ansible, or similar. • Excellent troubleshooting and problem-solving skills. • Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams. • Experience leading and mentoring engineering teams is highly desirable. • Knowledge of security best practices and experience implementing security controls and measures. • Experience with chaos engineering and resilience testing. • Familiarity with AI/ML applications in operational processes. • Knowledge of security best practices and compliance requirements.

Apply Now

Similar Jobs

2 days ago

Senior DevOps Engineer for telecom services company working fully remote.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

2 days ago

WorkWave

1001 - 5000

Join WorkWave as a DevOps Engineer focusing on cloud infrastructure solutions.

🇺🇸 United States – Remote

💵 $95k - $140k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

3 days ago

Support genomic surveillance and outbreak investigations as Bioinformatics DevOps Architect.

3 days ago

Fetch Rewards

501 - 1000

Build and run distributed systems for Fetch’s rewards platform.

🇺🇸 United States – Remote

💰 Debt Financing on 2022-04

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com