Senior Solution Architect - HPC and AI

November 8

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing β€’ artificial intelligence β€’ deep learning β€’ virtual reality β€’ gaming

10,000+

Description

β€’ Primary responsibilities will include building robust AI/HPC infrastructure for new and existing customers. β€’ Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting. β€’ Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement. β€’ Your primary focus would be on understanding the AI workload and how it interacts with other parts of the system like networking, storage, deep learning frameworks, data cleaning tools, etc. β€’ Help maintain services once they are live by measuring and monitoring progress of AI jobs and helping engineering design solutions for more robust training at scale. β€’ Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

Requirements

β€’ BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience with Python/ C++ / other software development. β€’ Track record of medium to large scale AI training and understanding of key libraries used for NLP/LLM/VLA training (NeMo Framework, DeepSpeed etc.) β€’ Experience with integration and deployment of software products in production enterprise environments, and microservices software architecture. β€’ You are excited to work with multiple levels and teams across organisations (Engineering, Product, Sales and Marketing team) β€’ Capable of working in a constantly evolving environment without losing focus. β€’ Ability to multitask in a fast-paced environment. β€’ Driven with strong analytical and problem-solving skills. β€’ Strong time-management and organization skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very sophisticated projects. β€’ You are a self-starter with demeanour for growth, passion for continuous learning and sharing findings across the team. β€’ Technical leadership and strong understanding of NVIDIA technologies, and success in working with customers. β€’ Excellent verbal, written communication, and technical presentation skills in English.

Apply Now

Similar Jobs

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com