October 15
🇺🇸 United States – Remote
💵 $120k - $135k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
• Collaborate with cross-functional teams to craft and implement a modern observability stack and refine our incident-handling processes. • Design and contribute to state-of-the-art cloud provider solutions for high-performance computing, AI training, and inference workloads, focusing on Observability and MLOps. • The platform team aims to enhance the resilience and stability of our systems through thoughtful software improvements, architecture, and automation. • Contribute to solutions for various challenges ranging in nature from low-level hardware issues to high-level distributed application scale challenges and everything in between. • Champion DevOps and SRE principles through automation, thought leadership, and close collaboration within our engineering team. • Enhance customer experience by improving case handling—strive for proactive responses, rich insights, and automated resolutions. • Develop robust documentation to streamline the handling of recurring reliability issues, paving the way for junior SREs to take the helm confidently. • Identify and implement scalable solutions to address technical challenges within our stack, setting new benchmarks for innovation.
• 3+ years of experience in a hands-on SRE role delivering distributed architectures. • 2+ years working with and maintaining Kubernetes clusters for highly available and regulated environments. • 2+ years of hands-on experience with a modern Grafana stack, including Mimir, Loki, and Tempo. • Comfortable working with complex CI/CD Pipelines (Gitlab/Jenkins), configuration management (Puppet/Salt), and IaC solutions such as Terraform • Experience working with observability pipelines or Open Telemetry is a plus. • A background in performance optimization for Webstacks, including components such as PHP-FPM, Ningx, and Mysql • Boasts strong programming chops in Python, Golang, or PHP and thrives when picking up new technologies.
• A 100% remote work environment + a company-wide virtual get together • 401(k) plan that matches 100% up to 4% with immediate vesting • Professional Development Reimbursement of $2,500 each year • 11 Holidays + Paid Time Off Accrual + Rollover Plan + take off your birthday! • Commitment matters to Vultr! Increased PTO at 3 year anniversary + 1 month sabbatical at 5 year anniversary + Anniversary Bonus each year • $500 first year remote office setup + $400 each year following for new equipment • Monthly internet reimbursement up to $75 • $50 per month for a gym membership
Apply NowOctober 10
51 - 200
Lead technology team at Mercy For Animals to achieve its mission.
🇺🇸 United States – Remote
💵 $111k - $115k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
September 24
51 - 200
Lead Telecom & SMS systems for a text message marketing company.
🇺🇸 United States – Remote
💵 $155k - $188k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
September 20
11 - 50
Implementing and managing cloud infrastructure, focusing on Azure services.
September 20
51 - 200
DevOps engineer to enhance Lilt's AI platform and infrastructure.
🇺🇸 United States – Remote
💰 $55M Series C on 2022-04
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
September 20
10,000+
Lead engineering for GEICO's transformation into a tech organization.
🇺🇸 United States – Remote
💵 $105k - $230k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor