ML Engineer - Large Language Models, LLM Training & Inference Optimization

April 1

Apply Now
Logo of Nebius Group

Nebius Group

Nebius Group is building one of the world’s leading AI infrastructure companies, focusing on providing the necessary compute, storage, and tools for developers in the AI space. Based in Europe and listed on Nasdaq, Nebius has a global presence with R&D centers across Europe, North America, and Israel. The company's primary offering is an AI-centric cloud platform designed for intensive AI workloads, complemented by various other businesses involved in generative AI development, edtech, and autonomous technology.

AI infrastructure • Cloud platforms • GPU clusters • GPU cloud • GPUs as a service

1001 - 5000 employees

🏢 Enterprise

☁️ SaaS

📋 Description

• We are currently in search of senior and staff-level ML engineers to work on optimizing training and inference performance in a large-scale multi-GPU multi-node setups. • This role will require expertise in distributed systems and high-performance computing to build, optimize, and maintain robust pipelines for training and inference. • Your responsibilities will include: • Architect and implement distributed training and inference pipelines leveraging techniques such as data, tensor, context, expert (MoE) and pipeline parallelism. • Implement various inference optimization techniques - speculative decoding and its extensions (Medusa, EAGLE, etc.), CUDA-graphs, compile-based optimization. • Implement custom CUDA/Triton kernels for performance-critical layers.

🎯 Requirements

• A profound understanding of theoretical foundations of machine learning • Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimizations, dynamic batching etc.) • Expertise in at least one of those fields: • Implementing custom efficient GPU kernels in CUDA and/or Triton • Training large models on multiple nodes and implementing various parallelism techniques • Inference optimization techniques - disaggregated prefill/decode, paged attention, continuous batching, speculative decoding, etc. • Strong software engineering skills (we mostly use python) • Deep experience with modern deep learning frameworks (we use JAX & PyTorch) • Proficiency in contemporary software engineering approaches, including CI/CD, version control and unit testing • Strong communication and ability to work independently

🏖️ Benefits

• Competitive salary and comprehensive benefits package. • Opportunities for professional growth within Nebius. • Hybrid working arrangements. • A dynamic and collaborative work environment that values initiative and innovation.

Apply Now

Discover 100,000+ Remote Jobs!

Join now to unlock all jobs

Discover hidden jobs

We scan the internet everyday and find jobs not posted on LinkedIn or other job boards.

Head start against the competition

We find jobs as soon as they're posted, so you can apply before everyone else.

Be the first to know

Daily emails with new job openings straight to your inbox.

Choose your membership

Loved by 10,000+ remote workers
🎉$6 / week

Cancel anytime

MOST POPULAR
🥳$18 / month
$24
Save 25% vs weekly

Cancel anytime

BEST VALUE
🥰$54 / year
$216
Save 75% vs monthly

Cancel anytime

Wall of Love

Frequently asked questions

We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.

Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, you’ll still have access until the end of your current billing period.

Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!

New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.

Yes! We’re always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.

Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.

Why I created Remote Rocketship

Choose your membership

Loved by 10,000+ remote workers
🎉$6 / week

Cancel anytime

MOST POPULAR
🥳$18 / month
$24
Save 25% vs weekly

Cancel anytime

BEST VALUE
🥰$54 / year
$216
Save 75% vs monthly

Cancel anytime

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com