Senior Software Engineer - Reliability and Operational Excellence

October 15

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Description

• Ensure GPU cloud services run maximum reliability and uptime. • Enable developers to make changes to the existing system. • Build tooling, reporting, automation, and ML for operational excellence. • Integrate tooling with internal and customer workflows along with cloud service providers to streamline incident management process. • Evangelize sustainable blameless incident prevention and incident response. • Consult with peer teams on operations best practices.

Requirements

• BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience. • 5+ years of experience. • A track record showing a good balance between initiating your own projects, convincing others to collaborate with you and collaborating well on projects initiated by others. • Experience with infrastructure automation and distributed systems design developing tools for running large scale private or public cloud systems in production. • Experience in one or more of the following: Python, Go, Typescript, C/C++, Java • In depth knowledge in one or more of Linux, Networking, Storage, and Containers.

Benefits

• Equity • Benefits

Apply Now

Similar Jobs

October 14

Develop and maintain Workday payroll solutions at Home Depot.

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com