Head of Site Reliability Engineering

September 8

Apply Now
Logo of Swirlds Inc

Swirlds Inc

Swirlds is a software platform designed to build fully-distributed applications that harness the power of the cloud without servers$1. .$1

2 - 10

Description

• Leading the design, deployment, and management of infrastructure, ensuring high availability, reliability, and scalability • Building, mentoring, and leading a globally distributed SRE team across multiple time zones (APAC, LATAM, etc.) with a follow-the-sun on-call support model • Developing and managing SLAs for availability, performance, and uptime while driving operational excellence and automation • Creating and implementing strategies for continuous delivery, monitoring, and incident response to ensure minimal downtime and rapid recovery • Partnering with engineering teams to design scalable and fault-tolerant architecture and processes • Overseeing security best practices, including vulnerability management, monitoring, and compliance with industry standards • Developing tools and processes for automation of infrastructure, monitoring, alerting, and incident management • Managing budgets, vendors, and third-party tools related to infrastructure, ensuring cost-effectiveness and efficiency • Ensuring comprehensive documentation and training for all infrastructure, deployment, and operational processes

Requirements

• 10+ years of experience in Site Reliability Engineering (SRE) or infrastructure engineering, with at least 5 years in leadership roles • Proven experience in designing, deploying, and managing large-scale distributed systems, preferably in a cloud environment (AWS, GCP, Azure) • Strong expertise in automation tools (Terraform, Ansible, etc.) and scripting languages (Python, Bash, etc.) • Strong experience with containerization and orchestration technologies such as Docker and Kubernetes • Deep understanding of network infrastructure, load balancing, firewalls, VPNs, and security best practices • Proven track record of meeting or exceeding SLAs for system uptime and performance • Experience building and leading teams across multiple regions and time zones • Familiarity with managing infrastructure in a highly regulated or security-sensitive environment • Strong understanding of CI/CD pipelines and incident management platforms (PagerDuty, Opsgenie) • Strong understanding of LGTM stack • Excellent leadership, communication, and project management skills

Apply Now

Similar Jobs

September 6

Coinbase

1001 - 5000

Staff Site Reliability Engineer responsible for reliability at Coinbase.

🇺🇸 United States – Remote

đź’µ $211.7k - $249k / year

đź’° $21.4M Post-IPO Equity on 2022-11

⏰ Full Time

đź”´ Lead

⛑ DevOps & Site Reliability (SRE)

đź—˝ H1B Visa Sponsor

September 5

Datavant

201 - 500

Staff Engineer for healthcare data logistics platform at Datavant.

🇺🇸 United States – Remote

đź’µ $220k - $260k / year

đź’° $40M Series B on 2020-10

⏰ Full Time

đź”´ Lead

⛑ DevOps & Site Reliability (SRE)

đź—˝ H1B Visa Sponsor

August 24

EarnIn

201 - 500

Enhance service reliability and performance for earned wage access applications.

🇺🇸 United States – Remote

đź’µ $206.6k - $252.6k / year

đź’° $125M Series C on 2018-12

⏰ Full Time

đź”´ Lead

⛑ DevOps & Site Reliability (SRE)

đź—˝ H1B Visa Sponsor

August 23

Gemini

501 - 1000

Lead engineering teams towards modern DevOps practices through automation and tooling solutions.

🇺🇸 United States – Remote

đź’µ $172k - $215k / year

đź’° Venture Round on 2022-02

⏰ Full Time

đź”´ Lead

⛑ DevOps & Site Reliability (SRE)

đź—˝ H1B Visa Sponsor

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com

Join our Facebook group

👉 Remote Jobs Network