October 31
• You would be working on our pre-training team focused on building out our distributed training of Large Language Models and major architecture changes. • This is a hands-on role where you'll be both programming and implementing LLM architectures (dense & sparse) and distributed training code all the way from data to tensor parallelism, while researching potential optimizations (from basic operations to communication) and new architectures & distributed training strategies. • You will have access to thousands of GPUs in this team. • To train the best foundational models for source code generation in the world in minimum time and with maximum hardware utilization.
• Experience with Large Language Models (LLM) • Deep knowledge of Transformers is a must • Knowledge/Experience with cutting-edge training tricks • Knowledge/Experience of distributed training • Trained LLMs from scratch • Coded LLMs from scratch • Knowledge of deep learning fundamentals • Strong machine learning and engineering background • Research experience • Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have • Can freely discuss the latest papers and descend to fine details • Is reasonably opinionated • Programming experience • Linux • Strong algorithmic skills • Python with PyTorch or Jax • C/C++, CUDA, Triton • Use modern tools and are always looking to improve • Strong critical thinking and ability to question code quality policies when applicable • Prior experience in non-ML programming, especially not in Python - is a nice to have
• Fully remote work & flexible hours • 37 days/year of vacation & holidays • Health insurance allowance for you and dependents • Company-provided equipment • Wellbeing, always-be-learning and home office allowances • Frequent team get togethers • Great diverse & inclusive people-first culture
Apply Now