October 25
• Ensure the safety, security, and reliability of our cloud infrastructure • Build and maintain Gretel's observability stack • Scale systems sustainably with automation • Manage and lead incident response, recovery, and postmortems • Partner with software engineers to troubleshoot production issues • Build tools and frameworks for Gretel engineers • Ship complex ML/AI models with applied science and engineering teams
• Experience with at least one cloud platform (we use AWS heavily) • Experience with Docker and Kubernetes • Ability to write software and tools in Python or Go • Experience with monitoring, alerting and operations • Experience operating highly available distributed systems in the cloud • Experience identifying, diagnosing, and responding to operational outages • Experience with infrastructure as code (Terraform, CloudFormation, etc) • Experience with build systems such as Bazel • Experiencing shipping application with complex dependencies (Pytorch, Tensorflow) • Software engineering skills beyond script writing (TDD, design patterns, etc) • Experience with DevOps or CI/CD pipelines
Apply NowOctober 17
Develop and refine software for Home Depot's customer and associate needs.
October 17
Database Reliability Engineer II for SaaS applications at Aya Healthcare.
October 17
Seeking a Senior DevSecOps Engineer to enhance security at Second Front Systems.
October 17
Lead security initiatives at SumerSports, enhancing software development processes.
October 14
Senior Software Engineer at DICK’S Sporting Goods focused on reliability and performance.