Senior Site Reliability Engineer - Omniverse Cloud Platform

November 4

🇺🇸 United States – Remote

💵 $180k - $339.3k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+

Description

• Own, innovate, and build programs, new software, and analytics that drive improvements to the availability, scalability, latency, and efficiency of Omniverse products and services • Handle upgrades, and automated rollbacks across all clusters • Maintain Service Level Agreement (SLAs) of measurable benchmarks, working hand in hand with developers of new services on how to define SLIs, and design a stable, secure service • Help guide the Change Advisory Board, and RCCA processes • Work with product area leads from technologies across NVIDIA to guide product engineering to build fast, reliable, and durable production systems • Apply standard methodologies and first principled thinking to Omniverse and other strategic Cloud offerings from NVIDIA.

Requirements

• Bachelor's degree in Computer Science or a related field, or equivalent experience • 8+ years of demonstrated competency in system design, complexity analysis, software design in Unix/Linux systems, performance, and application issues • 8+ years' of validated experience authoring, and debugging software written in C++ and Python • Deep hands-on experience with Kubernetes based cloud environments • Proven experience in incident management and large scale incident coordination. • Experience working with partners across multiple teams • Background with HPC or Model Training Operations or related experience.

Benefits

• Competitive salary package • Eligible for equity and benefits

Apply Now

Similar Jobs

November 4

LaunchTech

11 - 50

Remote DevOps Engineer to support Point-of-Sale Service projects.

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

November 4

DevOps Engineer optimizing software delivery at Hawkes Learning.

🇺🇸 United States – Remote

💵 $125k - $140k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

November 4

RSI

501 - 1000

Designing reliable infrastructure solutions for RSI, a government services company.

🇺🇸 United States – Remote

💵 $80k - $125k / year

💰 Private Equity Round on 2020-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

November 4

Shee Atiká

201 - 500

Seeking a DevOps Expert II for cloud-supporting US DoD program.

🇺🇸 United States – Remote

💵 $135k - $150k / year

⏰ Full Time

🔴 Lead

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

November 4

Shee Atiká

201 - 500

DevOps Expert III for Alaska Northstar Federal supporting DoD program.

🇺🇸 United States – Remote

💵 $160k - $185k / year

⏰ Full Time

🔴 Lead

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com