Swirlds Inc

Website LinkedIn All Job Openings

2 - 10 employees

🌐 Web 3

₿ Crypto

Head of Site Reliability Engineering

September 8

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Azure

Bash

Cloud

Distributed Systems

Docker

Firewalls

Google Cloud Platform

Kubernetes

Open Source

Python

Terraform

Web3

Apply Now

Swirlds Inc

Website LinkedIn All Job Openings

2 - 10 employees

🌐 Web 3

₿ Crypto

Description

• Leading the design, deployment, and management of infrastructure, ensuring high availability, reliability, and scalability • Building, mentoring, and leading a globally distributed SRE team across multiple time zones (APAC, LATAM, etc.) with a follow-the-sun on-call support model • Developing and managing SLAs for availability, performance, and uptime while driving operational excellence and automation • Creating and implementing strategies for continuous delivery, monitoring, and incident response to ensure minimal downtime and rapid recovery • Partnering with engineering teams to design scalable and fault-tolerant architecture and processes • Overseeing security best practices, including vulnerability management, monitoring, and compliance with industry standards • Developing tools and processes for automation of infrastructure, monitoring, alerting, and incident management • Managing budgets, vendors, and third-party tools related to infrastructure, ensuring cost-effectiveness and efficiency • Ensuring comprehensive documentation and training for all infrastructure, deployment, and operational processes

Requirements

• 10+ years of experience in Site Reliability Engineering (SRE) or infrastructure engineering, with at least 5 years in leadership roles • Proven experience in designing, deploying, and managing large-scale distributed systems, preferably in a cloud environment (AWS, GCP, Azure) • Strong expertise in automation tools (Terraform, Ansible, etc.) and scripting languages (Python, Bash, etc.) • Strong experience with containerization and orchestration technologies such as Docker and Kubernetes • Deep understanding of network infrastructure, load balancing, firewalls, VPNs, and security best practices • Proven track record of meeting or exceeding SLAs for system uptime and performance • Experience building and leading teams across multiple regions and time zones • Familiarity with managing infrastructure in a highly regulated or security-sensitive environment • Strong understanding of CI/CD pipelines and incident management platforms (PagerDuty, Opsgenie) • Strong understanding of LGTM stack • Excellent leadership, communication, and project management skills

Apply Now