Senior HPC Systems Engineer

October 18

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Description

• Lead all aspects of implementing performance practices in large scale infrastructure, deliver powerful tools, methodologies, and flows to validate and improve several datacenter products in parallel. • Accelerate strategic customer deployments and ensure speed-of-light bringup and deployment of ground-breaking AI infrastructure by working hand in hand tailoring design and faster processes to customer needs. • Provide engineering solutions to enable large scale performance strategies for performance for Datacenter GPU Computing products and software stacks, ensure technical relationships with internal and external engineering teams, and assisting systems engineers in building creative solutions based on NVIDIA technology. • Participate in engagements with various SW and FW (BMC/SBIOS/OS/drivers etc) teams to develop best-in-class practices and tools, analyzing, debugging and resolving critical software issues for the best AI workload performance at scale. • Own the architecting of performance design and settings of datacenter at scale products both implemented in FW and SW components to ensure velocity and scale while efficiently using resources. • Build end-to-end solutions and optimize datacenter product designs.

Requirements

• 5+ years of experience in using accelerated computing for datacenter container computing solutions. • BS in Engineering, Mathematics, Physics, or Computer Science, MS or PhD desirable (or equivalent experience). • Solid understanding of accelerated parallel computing models (MPI, NCCL). • Experience using and handling modern Cloud and container-based Enterprise computing architectures. • C/C++/Python/Bash programming/scripting experience. • Experience with CPU architecture. • Experience with container technology and Linux based OSes. • Experience working with engineering or academic research community supporting high performance computing or deep learning. • Strong verbal and written communication skills as well as excellent teamwork and communication skills. • Ability to multitask effectively in a dynamic environment.

Benefits

• Highly competitive salaries • Comprehensive benefits package • Equity opportunities

Apply Now

Similar Jobs

October 17

Vital Farms

201 - 500

Senior Systems Analyst to optimize Record to Report operations at Vital Farms.

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com