GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
10,000+
October 31
🇺🇸 United States – Remote
💵 $248k - $385.3k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
10,000+
• Lead initiatives to transform IT Compute platform architecture to build new service offerings across On-Prem & Cloud • Define and implement metrics to measure the efficiency of compute platforms & services • Collect and review system data for capacity and planning purposes • Develop and maintain tools for collecting, analyzing, and visualizing data for reporting, alerting, monitoring • Collaborate with NVIDIA leadership, senior engineers, program managers, and product managers
• Bachelor’s degree in Engineering, Computer Science, Mathematics, or related field, or equivalent experience • 12+ years of proven experience in compute platform engineering with a focus on automation • Proven experience in designing and deploying virtualization architectures including expertise with Kubernetes distributions • In-depth knowledge of hardware technologies, including SR-IOV, DPU, and GPU • Proven experience evaluating existing application architectures and identify opportunities for containerization • Strong analytical skills with the ability to define and track key performance metrics • Experience in developing tools for data analysis and performance profiling • Proficiency in programming languages such as Go and/or Python • Experience with running large environments consisting of BareMetal, large scale virtualized environment with a mix of tens of thousands of VM’s and cloud infrastructure
• Eligible for equity and benefits • NVIDIA is committed to fostering a diverse work environment
Apply NowOctober 31
51 - 200
ServiceNow Developer & DevOps Engineer for education-focused IT-services company.
🇺🇸 United States – Remote
💵 $90k - $130k / year
⏰ Full Time
🔴 Lead
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
October 31
51 - 200
Help scale and optimize infrastructure for Replicant's AI customer service platform.
October 30
201 - 500
Manage software build and release processes for Agility Robotics.
🇺🇸 United States – Remote
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
October 22
201 - 500
Design and implement reliable systems for SimSpace's cloud-based applications.
🇺🇸 United States – Remote
💵 $204k - $275k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
October 20
51 - 200
Site Reliability Engineer at Global InfoTek, managing infrastructure as code.