Member of Engineering - GPU

23 hours ago

Apply Now
Logo of Poolside

Poolside

Blockchain • Accelerator • Incubator • Hub • Fundraising

11 - 50

Description

• Performance-oriented programming in CUDA, C++, Cython, Triton. • Accelerate high-level primitives used to train Large Language Models (LLMs) and optimize distributed communication over AWS EFAv2. • Working on poolside's own implementation of distributed training for LLMs. • Ensure cutting-edge performance of LLM pre-training and fine tuning on huge state-of-the-art GPU clusters. • Profile CPU and CUDA code at several abstraction levels. • Debug and profile distributed applications. • Troubleshoot undocumented CUDA internals. • Hack the NCCL library used for GPU communication. • Tune vanilla CUDA, Triton, CUTLASS kernels for the latest NVIDIA GPUs. • Hack PyTorch internals.

Requirements

• Engineering background • Expert understanding of GPU hardware/architecture • Strong C/C++ programming skills • Fine-grained knowledge of CUDA programming • Strong algorithmic skills • System programming on Linux experience • Plus: knowledge of CPython internals and experience of native extension development • Plus: knowledge of AWS EFA internals • Plus: compiler development background

Benefits

• Fully remote work & flexible hours • 37 days/year of vacation & holidays • Health insurance allowance for you and dependents • Company-provided equipment • Wellbeing, always-be-learning and home office allowances • Frequent team get togethers • Great diverse & inclusive people-first culture

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com