Senior Site Reliability Engineer, Omniverse Cloud Platform

4 days ago

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Description

• Own, innovate, and build programs, new software, and analytics that drive improvements to the availability, scalability, latency, and efficiency of Omniverse products and services • Handle upgrades, and automated rollbacks across all clusters • Maintain Service Level Agreement (SLAs) of measurable benchmarks, working hand in hand with developers of new services on how to define SLIs, and design a stable, secure service • Help guide the Change Advisory Board, and RCCA processes • Work with product area leads from technologies across NVIDIA to guide product engineering to build fast, reliable, and durable production systems • Apply standard methodologies and first principled thinking to Omniverse and other strategic Cloud offerings from NVIDIA.

Requirements

• Bachelor's degree in Computer Science or a related field, or equivalent experience • 8+ years of demonstrated competency in system design, complexity analysis, software design in Unix/Linux systems, performance, and application issues • 8+ years' of validated experience authoring, and debugging software written in C++ and Python • Deep hands-on experience with Kubernetes based cloud environments • Proven experience in incident management and large scale incident coordination • Experience working with partners across multiple teams • Background with HPC or Model Training Operations or related experience.

Benefits

• Equity • Benefits

Apply Now

Similar Jobs

4 days ago

Join SandboxAQ as a Senior DevOps Engineer leading technical support and infrastructure initiatives in AI/ML and quantum technologies.

🇺🇸 United States – Remote

💵 $150k - $215k / year

💰 $500M Venture Round on 2023-02

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

5 days ago

Join Eigen Labs as a Senior DevOps Engineer to build scalable infrastructure for EigenLayer. Drive innovations and collaborate in a remote-first environment.

🇺🇸 United States – Remote

💵 $190k - $225k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

5 days ago

Join Mediaocean as a Senior Back-End Software Engineer, focusing on robust backend systems. Collaborate in creating impactful advertising technology solutions.

🇺🇸 United States – Remote

💵 $125k - $130k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

5 days ago

Join Flashbots as a Senior DevOps Engineer to support key blockchain products like SUAVE and more.

🇺🇸 United States – Remote

💰 Seed Round on 2020-01

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

6 days ago

Work cross-departmentally to improve web services scalability and performance at Weedmaps as a Senior Site Reliability Engineer.

🇺🇸 United States – Remote

💰 $325M Post-IPO Equity on 2021-06

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com