Senior Solution Architect - HPC and AI

November 8, 2024

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Description

• Primary responsibilities will include building robust AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement. • Your primary focus would be on understanding the AI workload and how it interacts with other parts of the system like networking, storage, deep learning frameworks, data cleaning tools, etc. • Help maintain services once they are live by measuring and monitoring progress of AI jobs and helping engineering design solutions for more robust training at scale. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

Requirements

• BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience with Python/ C++ / other software development. • Track record of medium to large scale AI training and understanding of key libraries used for NLP/LLM/VLA training (NeMo Framework, DeepSpeed etc.) • Experience with integration and deployment of software products in production enterprise environments, and microservices software architecture. • You are excited to work with multiple levels and teams across organisations (Engineering, Product, Sales and Marketing team) • Capable of working in a constantly evolving environment without losing focus. • Ability to multitask in a fast-paced environment. • Driven with strong analytical and problem-solving skills. • Strong time-management and organization skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very sophisticated projects. • You are a self-starter with demeanour for growth, passion for continuous learning and sharing findings across the team. • Technical leadership and strong understanding of NVIDIA technologies, and success in working with customers. • Excellent verbal, written communication, and technical presentation skills in English.

Apply Now

Similar Jobs

September 20, 2024

Senior Solutions Engineer at Saviynt selling Access Governance and Cloud Security solutions.

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com