Senior Software Engineer - Reliability and Operational Excellence

October 15

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing β€’ artificial intelligence β€’ deep learning β€’ virtual reality β€’ gaming

10,000+

Description

β€’ Ensure GPU cloud services run maximum reliability and uptime. β€’ Enable developers to make changes to the existing system. β€’ Build tooling, reporting, automation, and ML for operational excellence. β€’ Integrate tooling with internal and customer workflows along with cloud service providers to streamline incident management process. β€’ Evangelize sustainable blameless incident prevention and incident response. β€’ Consult with peer teams on operations best practices.

Requirements

β€’ BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience. β€’ 5+ years of experience. β€’ A track record showing a good balance between initiating your own projects, convincing others to collaborate with you and collaborating well on projects initiated by others. β€’ Experience with infrastructure automation and distributed systems design developing tools for running large scale private or public cloud systems in production. β€’ Experience in one or more of the following: Python, Go, Typescript, C/C++, Java β€’ In depth knowledge in one or more of Linux, Networking, Storage, and Containers.

Benefits

β€’ Equity β€’ Benefits

Apply Now

Similar Jobs

October 14

Develop and maintain Workday payroll solutions at Home Depot.

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com