Site Reliability Engineering Manager

September 26

Apply Now
Logo of Nordhealth

Nordhealth

PHP • agile software development • CakePHP • Django • Veterinary Software

201 - 500

Description

• Lead, mentor, and support the SRE team members. • Oversee the monitoring, alerting, and troubleshooting of system issues. • Ensure high availability and reliability of production systems and services. • Coordinate response to system incidents and outages. • Perform post-incident reviews and ensure effective incident resolution and follow-up actions. • Manage and optimize the infrastructure, ensuring it meets current and future requirements. • Identify opportunities for automation to improve system reliability and operational efficiency. • Work closely with development, operations, and product teams to integrate reliability into the software development lifecycle. • Communicate effectively with stakeholders about system performance, incidents, and project status. • Define and track key performance indicators (KPIs) to measure system reliability and team performance. • Ensure systems adhere to security policies and compliance requirements.

Requirements

• Ideally, you have already gained some experience from working in a fast growing, global SaaS company. • Proficiency in AWS, Azure, or Google Cloud, and infrastructure as code (IaC) tools like Terraform. • Experience with monitoring tools like Prometheus or Grafana for real-time monitoring and alerting. • Experience in managing and responding to system incidents and outages. • Proven experience leading and managing an SRE or DevOps team. • Ability to prioritize tasks and manage multiple projects simultaneously. • Experience in planning and executing projects, including resource management and timeline adherence. • Experience working closely with cross-functional teams, including development, operations, and product teams. • Focus on automating processes to improve efficiency and reduce manual intervention. • Ability to use data and metrics to drive decisions and improvements. • Understanding of security best practices and compliance requirements. • Experience in performance tuning and capacity planning.

Benefits

• The chance to work in a meaningful industry and in a fast-growing, global company on a path to changing digital healthcare. • Competitive compensation and benefits. • Learning and professional growth opportunities. • The tools you need, and enjoy using. • Frequent company events and talented colleagues from around the world.

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com

Join our Facebook group

👉 Remote Jobs Network