Senior SRE Engineering Leader - AI Research Clusters

October 7

🇺🇸 United States – Remote

💵 $272k - $419.8k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+

Description

• Lead globally distributed GPU clusters for AI research • Manage cluster operational excellence and efficiency • Deliver scalable distributed systems and AI services • Build strong distributed teams and drive technical strategy • Collaborate to improve the GPU ecosystem for AI use cases • Solve reliability, efficiency, and productivity challenges for GPU infrastructure • Define strategy, manage projects, and drive technical leadership

Requirements

• 10+ overall years in engineering management; 3+ in leadership roles • Bachelor’s or Master’s in Computer Science or a related field, or equivalent experience • Experience supporting AI/ML workloads and driving operational standard methodologies • Strong Unix/Linux knowledge and proficiency in at least two programming languages (Perl, Python, Go) • Expertise in managing large-scale distributed systems and AI/HPC environments • Leadership experience, mentoring, and coaching skills • Ability to quickly learn and integrate new technologies • Strong collaboration skills across engineering, server, storage, and security teams

Benefits

• Highly competitive salaries • Comprehensive benefits package • Eligible for equity

Apply Now

Similar Jobs

October 5

OneLocal

51 - 200

Senior DevOps Engineer for AI-driven B2B marketing platform.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 5

Federato

11 - 50

Join Federato to enhance infrastructure for equitable insurance using AI/ML.

🇺🇸 United States – Remote

💵 $140 - $170 / year

💰 $15M Series A on 2022-09

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

September 29

HHAeXchange

501 - 1000

Support cloud services as a Sr SRE Engineer at HHAeXchange.

🇺🇸 United States – Remote

💵 $125k - $135k / year

💰 Private Equity Round on 2021-09

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

September 29

Avetta

501 - 1000

Avetta seeks DevOps Engineer to enhance global customer infrastructures and systems.

🇺🇸 United States – Remote

💰 Private Equity Round on 2019-02

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com