Senior Site Reliability Engineer - Reliability Enablement

September 17

Apply Now
Logo of Xero

Xero

Accounting β€’ SaaS β€’ Banking β€’ Invoicing β€’ Design

1001 - 5000

πŸ’° $300M Post-IPO Debt on 2018-09

Description

β€’ Investigating operational surprises and supporting teams in post incident activities. β€’ Conducting in depth incident analysis and maximizing post incident learning across the organization β€’ Complete short term reliability consultancy and enablement engagements such as SLO reviews and facilitating pre-mortems. β€’ Improving on call health, uplifting observability and addressing any operational hotspots β€’ Identifying, planning and leading implementation of reliability uplift work and initiatives β€’ Support delivery of strategic features and initiatives with reliability and distributed systems expertise β€’ Observing and improving rituals and practices relating to production operations, incident response and incident learning

Requirements

β€’ Solid experience in logging, monitoring and observability of a highly distributed system β€’ Leading incident management and response and troubleshooting efforts, including critical, complex and high severity incidents β€’ Post incident reviews, incident analysis and learning from incidents β€’ Experience working in a tech or product company with comparable scale and complexity β€’ Systems thinking and thinking about how systems and components interact, how they respond to failure β€’ Proficiency in one or more object-oriented programming languages (C#, JavaScript, Java, Python etc) or experience with infrastructure-as-code (e.g. Terraform, Cloudformation) β€’ Experience working with cloud providers such as AWS, Azure or GCP β€’ Experience with designing, developing and operating distributed systems and large scale software systems β€’ Strong experience delivering technical initiatives in an operational, site reliability or platform engineering capacity β€’ The ability to solve engineering challenges outside of your own team, including using influence rather than authority to enact change β€’ Demonstrated experience in reliability concepts like capacity management, autoscaling, deployment and release safety, software strategies for reliability, fault tolerance and graceful failure β€’ Experienced in implementing customer focused Service Level Objectives (SLOs) β€’ Experience using software engineering to solve operational and reliability challenges β€’ Understanding of human factors, safety science and resilience engineering β€’ Experience working in environments with advanced security and networks

Apply Now

Similar Jobs

September 16

McAfee

1001 - 5000

Senior Architect for AWS cloud solutions at McAfee, enhancing security and reliability.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $123.7k - $203.2k / year

πŸ’° Private Equity Round on 1991-09

⏰ Full Time

🟠 Senior

β›‘ DevOps & Site Reliability Engineer (SRE)

πŸ—½ H1B Visa Sponsor

September 16

GEICO

10,000+

Senior Manager at GEICO leads Site Reliability Engineering for high-performance platforms.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $110k - $261.5k / year

⏰ Full Time

🟠 Senior

β›‘ DevOps & Site Reliability Engineer (SRE)

πŸ—½ H1B Visa Sponsor

September 16

GEICO

10,000+

Senior Manager for Site Reliability Engineering at GEICO, enhancing insurance technology.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $110k - $261.5k / year

⏰ Full Time

🟠 Senior

β›‘ DevOps & Site Reliability Engineer (SRE)

πŸ—½ H1B Visa Sponsor

September 15

Dotdash Meredith

1001 - 5000

Lead DevOps Engineer at Dotdash Meredith improving CI/CD processes and infrastructure.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $165k - $190k / year

⏰ Full Time

🟠 Senior

β›‘ DevOps & Site Reliability Engineer (SRE)

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com