Solutions Architect, Infrastructure

Yesterday

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Description

• Technical advisor for the design, build-out, and optimization of university-level research computing infrastructures that include GPU-accelerated scientific workflows • Work with university research computing to optimize hardware utilization with software orchestration tools such as NVIDIA Base Command, Kubernetes, Slurm, and Jupyter notebook environments • Implement systems monitoring and telemetry tools to help optimize resource utilization, and track most demanding application workloads at research computing centers • Document what you learn including building targeted training, writing whitepapers, blogs, and wiki articles, and working through hard problems with a customer on a whiteboard • Provide customer requirements and feedback to product and engineering teams

Requirements

• MS or PhD in Engineering, Mathematics, Physical Sciences, or Computer Science (or equivalent experience) • 5+ years of relevant work experience • Strong experience in designing and deploying GPU-accelerated computing infrastructure • In-depth knowledge of cluster orchestration and job scheduling technologies, e.g. Slurm, Kubernetes, Ansible and/or Open OnDemand • Experience with container tools (Docker, Singularity, Enroot/Pyxis) including at-scale deployment of containerized environments • Expertise in systems monitoring, telemetry, and systems performance optimization of research computing environments • Familiarity with tools like Prometheus, Grafana or NVIDIA DCGM • Understanding of datacenter networking technologies (InfiniBand, Ethernet, OFED) and experience with network configuration • Familiarity with power and cooling systems architecture for data center infrastructure • Experience in deploying LLM training and inference workflows in a research computing environment • Experience working with technical computing customers in the academic research computing space • Practical knowledge of high-performance parallel file systems • Applications and systems-level knowledge of OpenMPI and NCCL • Experience with debugging and profiling tools. E.g. Nsight Systems, Nsight Compute, Compute Sanitizer, GDB or Valgrind

Benefits

• Highly competitive salaries • Comprehensive benefits package • Eligible for equity

Apply Now

Similar Jobs

3 days ago

Join NobleAI as Solutions Engineer, employing AI for energy-related challenges in Houston.

4 days ago

Join BeyondTrust as a Solutions Engineer, enhancing security solutions with technical support and client engagement.

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com