3 days ago
Ansible
AWS
Azure
Bash
Cloud
Docker
Google Cloud Platform
Kubernetes
Prometheus
Python
RPA
Terraform
WordPress
Go
• As a Site Reliability Engineer, you’ll play a crucial role in designing, implementing, and maintaining the reliability and efficiency of our platforms. • Collaborate with cross-functional teams to identify performance bottlenecks, troubleshoot complex issues, and optimize system performance to meet defined service level objectives. • Design and implement monitoring, alerting, and incident response strategies to proactively identify and mitigate potential issues. • Drive automation initiatives to streamline deployment, configuration management, and infrastructure provisioning processes. • Develop and maintain comprehensive documentation for system configurations, processes, and procedures. • Participate in on-call rotations and respond to incidents, working diligently to resolve issues and prevent recurrence.
• An active U.S. Government issued Secret security clearance (or higher). • Minimum of 3 years of professional experience in a Site Reliability Engineering role or similar capacity. • Strong experience with cloud technologies (e.g., AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible). • Proficiency in programming and scripting languages (e.g., Python, Go, Bash) and RPA (e.g. Blue Prism, UIPath) to automate tasks and develop tools. • Deep understanding of containerization and orchestration technologies (e.g., Kubernetes, Docker). • Expertise in implementing and managing monitoring and logging solutions (e.g., Zabbix, Nagios, Prometheus, ELK stack). • Proven track record of designing, building, and maintaining highly available and scalable systems. • Expert proficiency in developing automated functional, regression and performance tests and developing automated testing standards for development teams. • Experience facilitating change and configuration management processes to drive reliability. • Strong problem-solving skills, with the ability to diagnose complex issues and implement effective solutions. • Excellent communication skills, with the ability to collaborate effectively across diverse teams.
• generous benefits package • professional growth opportunities • valuable time to recharge
Apply Now3 days ago
Join the team supporting Atmosphere, an open-source cloud product, with strong OpenStack skills.
3 days ago
Join ScorePlay as a DevOps Engineer, ensuring reliability and scalability of their media platform.
🇺🇸 United States – Remote
💰 $5M Seed Round on 2023-07
⏰ Full Time
🟢 Junior
🟡 Mid-level
⛑ DevOps & Site Reliability Engineer (SRE)
🚫👨🎓 No degree required
3 days ago
Seeking a Junior to Mid-level DevOps Engineer for automation in a remote environment at NetFoundry.
🇺🇸 United States – Remote
⏰ Full Time
🟢 Junior
🟡 Mid-level
⛑ DevOps & Site Reliability Engineer (SRE)
🚫👨🎓 No degree required
3 days ago
Join GlobalBet as a remote DevOps Engineer, contributing to cutting-edge virtual sports technology.
3 days ago
As a Site Reliability Engineer, you’ll optimize system performance at LoanPro, a fintech innovator.
🇺🇸 United States – Remote
💰 Series A on 2021-07
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)