Senior System Software Engineer - NCCL

October 22, 2024

Apply Now
Logo of NVIDIA

NVIDIA

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

GPU-accelerated computing โ€ข artificial intelligence โ€ข deep learning โ€ข virtual reality โ€ข gaming

๐Ÿ“‹ Description

โ€ข Engage with our partners and customers to root cause functional and performance issues reported with NCCL โ€ข Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters โ€ข Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.) โ€ข Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters โ€ข Document and conduct trainings/webinars for NCCL โ€ข Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.

๐ŸŽฏ Requirements

โ€ข B.S./M.S. degree in CS/CE or equivalent experience โ€ข 5+ years of relevant experience โ€ข Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM) โ€ข Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design โ€ข Experience working with engineering or academic research community supporting HPC or AI โ€ข Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control โ€ข Expert in Linux fundamentals and a scripting language, preferably Python โ€ข Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible) โ€ข Adaptability and passion to learn new areas and tools โ€ข Flexibility to work and communicate effectively across different teams and timezones

๐Ÿ–๏ธ Benefits

โ€ข equity โ€ข benefits

Apply Now

Discover 100,000+ Remote Jobs!

Join now to unlock all jobs

Discover hidden jobs

We scan the internet everyday and find jobs not posted on LinkedIn or other job boards.

Head start against the competition

We find jobs within 24 hours of being posted, so you can apply before everyone else.

Be the first to know

Daily emails with new job openings straight to your inbox.

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers

Wall of Love

Frequently asked questions

We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.

Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, youโ€™ll still have access until the end of your current billing period.

Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!

New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.

Yes! Weโ€™re always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.

Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.

Why I created Remote Rocketship

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers
Built by Lior Neu-ner. I'd love to hear your feedback โ€” Get in touch via DM or lior@remoterocketship.com