Head of Site Reliability Engineering

September 8

Apply Now

Description

β€’ Leading the design, deployment, and management of infrastructure, ensuring high availability, reliability, and scalability β€’ Building, mentoring, and leading a globally distributed SRE team across multiple time zones (APAC, LATAM, etc.) with a follow-the-sun on-call support model β€’ Developing and managing SLAs for availability, performance, and uptime while driving operational excellence and automation β€’ Creating and implementing strategies for continuous delivery, monitoring, and incident response to ensure minimal downtime and rapid recovery β€’ Partnering with engineering teams to design scalable and fault-tolerant architecture and processes β€’ Overseeing security best practices, including vulnerability management, monitoring, and compliance with industry standards β€’ Developing tools and processes for automation of infrastructure, monitoring, alerting, and incident management β€’ Managing budgets, vendors, and third-party tools related to infrastructure, ensuring cost-effectiveness and efficiency β€’ Ensuring comprehensive documentation and training for all infrastructure, deployment, and operational processes

Requirements

β€’ 10+ years of experience in Site Reliability Engineering (SRE) or infrastructure engineering, with at least 5 years in leadership roles β€’ Proven experience in designing, deploying, and managing large-scale distributed systems, preferably in a cloud environment (AWS, GCP, Azure) β€’ Strong expertise in automation tools (Terraform, Ansible, etc.) and scripting languages (Python, Bash, etc.) β€’ Strong experience with containerization and orchestration technologies such as Docker and Kubernetes β€’ Deep understanding of network infrastructure, load balancing, firewalls, VPNs, and security best practices β€’ Proven track record of meeting or exceeding SLAs for system uptime and performance β€’ Experience building and leading teams across multiple regions and time zones β€’ Familiarity with managing infrastructure in a highly regulated or security-sensitive environment β€’ Strong understanding of CI/CD pipelines and incident management platforms (PagerDuty, Opsgenie) β€’ Strong understanding of LGTM stack β€’ Excellent leadership, communication, and project management skills

Apply Now

Similar Jobs

September 6

Coinbase

1001 - 5000

Staff Site Reliability Engineer responsible for reliability at Coinbase.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $211.7k - $249k / year

πŸ’° $21.4M Post-IPO Equity on 2022-11

⏰ Full Time

πŸ”΄ Lead

β›‘ DevOps & Site Reliability Engineer (SRE)

πŸ—½ H1B Visa Sponsor

August 23

Gemini

501 - 1000

Lead engineering teams towards modern DevOps practices through automation and tooling.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $198k - $247k / year

πŸ’° Venture Round on 2022-02

⏰ Full Time

πŸ”΄ Lead

β›‘ DevOps & Site Reliability Engineer (SRE)

πŸ—½ H1B Visa Sponsor

June 18

Ensure reliability and performance of blockchain infrastructure at Movement Labs.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’° Seed Round on 2018-07

⏰ Full Time

πŸ”΄ Lead

β›‘ DevOps & Site Reliability Engineer (SRE)

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com