Senior On-Device Model Inference Optimization Engineer

November 28

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Description

• Develop and implement strategies to optimize AI model inference for on-device deployment. • Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands. • Optimize performance-critical components using CUDA and C++. • Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs. • Benchmark inference performance, identify bottlenecks, and implement solutions. • Research and apply innovative methods for inference optimization. • Adapt models for diverse hardware platforms and operating systems with varying capabilities. • Create tools to validate the accuracy and latency of deployed models at scale with minimal friction. • Recommend and implement model architecture changes to improve the accuracy-latency balance.

Requirements

• MSc or PhD in Computer Science, Engineering, or a related field, or equivalent professional experience. • Over 5 years of confirmed experience specializing in model inference and optimization. • 10+ years of work experience in a relevant area • Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT. • Proven experience in optimizing inference for transformer and convolutional architectures. • Strong programming proficiency in CUDA, Python, and C++. • In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search. • Skilled in building and deploying scalable, cloud-based inference systems. • Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance. • Meticulous attention to detail, ensuring precision and reliability in safety-critical systems. • Strong collaboration and communication skills for working optimally across multidisciplinary teams. • A proactive, diligent mentality with a drive to tackle complex optimization challenges.

Benefits

• Equity • Benefits

Apply Now

Similar Jobs

November 27

Lead engineering tasks for battery storage projects while ensuring successful execution and technical support.

November 27

Join MBL Technologies to support IAM projects and enhance identity management as a CyberArk Engineer.

November 27

Join Beeper as a Senior Reverse Engineer to connect messaging platforms and enable communication.

🇺🇸 United States – Remote

💵 $70k - $170k / year

💰 Funding Round on 2021-05

⏰ Full Time

🟠 Senior

🦅 H1B Visa Sponsor

November 27

As a Senior ServiceNow Engineer, design, implement and support Enova’s ServiceNow environment. Collaborate in a team setting while providing tier 2 support in a 24x7 environment.

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com