Join our Facebook group

👉 Remote Jobs Network

Senior Site Reliability Engineer

May 30

🇺🇸 United States – Remote

⏰ Full Time

đźź  Senior

👨🏻‍🔧 Site Reliability Engineer (SRE)

Apply Now
Logo of Hypori

Hypori

Never Trust, Always Verify - Hypori Halo Zero Trust BYOD

Mobile Security • Mobile Application Management • Enterprise Mobility Management • BYOD • VMI

51 - 200

Description

• Propose, design, develop, and ship platform software to increase product reliability and efficiency • Guide reliability practice through the SSDLC through activities including architecture reviews, code reviews, capacity/scaling planning, and test automation • Maintain service and platform health through monitoring and follow-the-sun incident response • Run infrastructure with Terraform, CI/CD, K8s, and other appropriate cloud tools • Improve reliability by leading incident investigations and postmortems, documenting the findings, and using code and automation to create repeatable actions to prevent problem recurrence • Improve operational processes continuously (release, deployment, patches, etc.) to make them as reliable as possible • Support engineering efforts to implement new cloud-based projects • Automate deployment and maintenance tasks using infrastructure as code and DevOps principles • Communicate technical designs and issues along with proposed solutions • Mentor junior SRE engineers

Requirements

• BS degree in Computer Science or related fields, with at least 10 years of related work experience • 6+ years of operating within cloud infrastructure environment in AWS and Azure utilizing Infrastructure as Code principles • 6+ years of Engineering, SRE, and DevOps experience in an agile environment • 6+ years supporting a 24x7 mission-critical SaaS environment • Experience in Python, Java, Go, or other language • Experience in using observability tools such as Datadog, Grafana, or New Relic • Experience in integrating monitoring and alerting systems using Webhooks and APIs • Expert in Terraform, Puppet, and Git • Expert in Linux system operations, debugging, networking, software development, and cloud concepts • Experience with Release automation, system administration, and configuration management • Strong experience with containerization technology and Kubernetes • Expertise in security, monitoring, and performance aspects of cloud-native applications • Experience in SRE principles such as SLIs, SLAOs, resilience, scaling, and performance • Professional experience with GitOps, Jenkins, or other workflow tools • Ability to debug, optimize code, and automate routine tasks • Experience in designing and building failover and recovery automation • Excellent verbal, written, and interpersonal communication skills • Outstanding problem-solving and decision-making skills • Must be a self-starter with drive, a high level of initiative, and self-direction. A problem solver and able to develop solutions to complex issues • Must be adept at working in a matrix position where results must be achieved across various departments without line authority. Comfortable working with all levels of the organization

Benefits

• medical, dental, and vision insurance • parental leave • life and disability packages • 401(k) plan with employer-matching contributions that vest starting from your first day of employment

Apply Now

Similar Jobs

May 7

Onebrief

2 - 10

🇺🇸 United States – Remote

đź’µ $150k - $215k / year

đź’° $21M Venture Round on 2022-10

⏰ Full Time

🟡 Mid-level

đźź  Senior

👨🏻‍🔧 Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com