Director, Site Reliability Engineering

March 17

Apply Now
Logo of Benchmark

Benchmark

Benchmark is a global product realization services company that specializes in providing comprehensive solutions in advanced computing, commercial aerospace, defense, medical technologies, and semiconductor capital equipment. The company offers a range of services from design engineering and precision machining to full-system electronic assembly and lifecycle management, ensuring reliable support for innovative products in demanding markets. With a collaborative approach that leverages cross-functional teams, Benchmark aims to be a trusted partner in delivering customized solutions tailored to complex challenges.

Advanced Technology • Design Engineering • Manufacturing • Order Fulfillment • Design

10,000+ employees

Founded 1979

🚀 Aerospace

⚕️ Healthcare Insurance

📋 Description

• We are seeking a Director of Site Reliability Engineering (SRE) to lead our SRE team in ensuring the availability, performance, and scalability of our critical systems. • This role is responsible for defining and driving reliability strategies, operational excellence, and incident response processes at scale. • You will collaborate closely with engineering, DevOps, and product teams to establish best practices and implement processes that enhance system resilience and service performance. • Define and execute the vision for site reliability, balancing innovation with operational stability. • Lead, mentor, and grow a high-performing SRE team, fostering a culture of ownership and continuous improvement. • Partner with Engineering, DevOps, and Product teams to embed reliability best practices into the development lifecycle. • Establish and refine SLIs, SLOs, and error budgets to measure and improve service reliability. • Develop and drive incident management processes, including real-time incident response, on-call coordination, and postmortem analysis to prevent recurring issues. • Implement and standardize operational readiness reviews and escalation procedures to ensure teams are equipped to handle incidents effectively. • Drive initiatives to reduce operational toil, leveraging automation where applicable to enhance team efficiency. • Collaborate with engineering teams to define performance testing and capacity planning strategies to proactively mitigate reliability risks. • Champion the adoption of observability, logging, and monitoring best practices, ensuring visibility into system health and performance.

🎯 Requirements

• 8+ years of experience in Site Reliability Engineering, DevOps, or related fields, with at least 3+ years in a leadership role. • Proven track record of driving operational excellence in large-scale, distributed systems. • Expertise in defining and implementing SLIs, SLOs, error budgets, and incident management processes. • Strong knowledge of observability tools such as Prometheus, Grafana, Datadog, New Relic, or similar. • Experience leading on-call rotations, postmortems, and operational readiness programs. • Excellent leadership, communication, and stakeholder management skills. • Deep experience with AWS cloud environments, including operational best practices for high availability and reliability. • AWS certifications such as AWS Certified DevOps Engineer – Professional, AWS Certified Solutions Architect – Professional, or AWS Certified Advanced Networking – Specialty. • Experience with AWS monitoring and logging tools (CloudWatch, X-Ray, AWS Config, GuardDuty). • Experience scaling SRE practices in high-growth or regulated environments. • Hands-on background in software engineering with Python, Bash, or similar languages.

Apply Now

March 13

Drive pre-sales solution design for CDW's technologies, focusing on customer engagement and sales strategies.

Discover 100,000+ Remote Jobs!

Join now to unlock all jobs

Discover hidden jobs

We scan the internet everyday and find jobs not posted on LinkedIn or other job boards.

Head start against the competition

We find jobs within 24 hours of being posted, so you can apply before everyone else.

Be the first to know

Daily emails with new job openings straight to your inbox.

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers

Wall of Love

Frequently asked questions

We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.

Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, you’ll still have access until the end of your current billing period.

Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!

New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.

Yes! We’re always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.

Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.

Why I created Remote Rocketship

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com