Solutions Architect - InfiniBand and HPC

September 15

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing β€’ artificial intelligence β€’ deep learning β€’ virtual reality β€’ gaming

10,000+

Description

β€’ Deploying, managing, and validating AI/HPC infrastructure in Linux-based environments for new and existing customers. β€’ Be the domain expert with customers during planning calls through implementation. β€’ Create and handover related documentation and perform knowledge transfers required to support customers as they roll out some of the most sophisticated systems in the world! β€’ Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

Requirements

β€’ 5+ years providing in-depth support and deployment services; solving problems for hardware and software products. β€’ Knowledge and experience with Linux system administration/dev ops, process management, package management, task scheduling, kernel management, boot procedures, troubleshooting, performance reporting/optimization/logging, and network-routing/advanced networking (tuning and monitoring). β€’ Experience in configuring, testing, validating, and issue resolution of LAN and InfiniBand networking, including use of validation tools for InfiniBand health and performance (ibdiag, etc.) and UFM (Unified Fabric Manager.) β€’ Experience with benchmarking tools such as HPL, NCCL tests, MLPERF. β€’ Scripting proficiency (Bash, Python, Ansible, etc.) and Automation tooling background (Ansible, Puppet, etc.). β€’ Familiarity with schedulers such as SLURM, LSF, UGE, etc. β€’ Kubernetes experience. β€’ Excellent interpersonal communication skills and the ability to deliver resolutions for customer issues as they arise. β€’ Strong self-organizational skills and ability to prioritize/multi-task easily with limited supervision. β€’ A willingness to travel to customer sites within the United States.

Benefits

β€’ Eligible for equity and benefits.

Apply Now

Similar Jobs

September 11

Sonar

201 - 500

Provide technical expertise to Federal customers evaluating Sonar's software solutions.

September 11

Clari

501 - 1000

Implement Clari's revenue platform, ensuring customer success post-launch.

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com