Senior Software Engineer, Site Reliability

October 25

Apply Now
Logo of Gretel.ai

Gretel.ai

Generative AI β€’ Synthetic Data β€’ Machine Learning β€’ Privacy β€’ AI

11 - 50

Description

β€’ Ensure the safety, security, and reliability of our cloud infrastructure β€’ Build and maintain Gretel's observability stack β€’ Scale systems sustainably with automation β€’ Manage and lead incident response, recovery, and postmortems β€’ Partner with software engineers to troubleshoot production issues β€’ Build tools and frameworks for Gretel engineers β€’ Ship complex ML/AI models with applied science and engineering teams

Requirements

β€’ Experience with at least one cloud platform (we use AWS heavily) β€’ Experience with Docker and Kubernetes β€’ Ability to write software and tools in Python or Go β€’ Experience with monitoring, alerting and operations β€’ Experience operating highly available distributed systems in the cloud β€’ Experience identifying, diagnosing, and responding to operational outages β€’ Experience with infrastructure as code (Terraform, CloudFormation, etc) β€’ Experience with build systems such as Bazel β€’ Experiencing shipping application with complex dependencies (Pytorch, Tensorflow) β€’ Software engineering skills beyond script writing (TDD, design patterns, etc) β€’ Experience with DevOps or CI/CD pipelines

Apply Now

Similar Jobs

October 22

Deputy

201 - 500

Support Reliability Engineers bridge between Implementation Engineers post-implementation at Deputy.

October 21

EverCommerce

1001 - 5000

Senior Database Reliability Engineer at EverCommerce managing databases for SaaS solutions.

πŸ‡ΊπŸ‡Έ United States – Remote

πŸ’΅ $100k - $120k / year

πŸ’° Private Equity Round on 2019-07

⏰ Full Time

🟠 Senior

πŸ—½ H1B Visa Sponsor

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com