Senior HPC AI Cluster Engineer

January 27

Apply Now
Logo of NVIDIA

NVIDIA

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Designing, implementing and maintaining large scale HPC/AI clusters with monitoring, logging and alerting • Managing Linux job/workload schedules and orchestration tools • Developing and maintaining continuous integration and delivery pipelines • Developing tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources • Deploying monitoring solutions for the servers, network and storage • Troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level • Being a technical resource, developing, re-defining and documenting standard methodologies to share with internal teams • Supporting Research & Development activities and engaging in POCs/POVs for future improvements

🎯 Requirements

• Bachelor's Degree in Computer Science, Engineering, or a related field; or equivalent experience • 5+ years of experience • Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software • Experience with job scheduling workloads and orchestration tools such as Slurm, K8s • Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc. • Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. • Familiarity with newer and emerging storage technologies. • Python programming and bash scripting experience. • Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef • Deep knowledge of Networking Protocols like InfiniBand, Ethernet • Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix) • Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)

🏖️ Benefits

• Equity • Benefits

Apply Now

January 27

Join our Transmission Line Engineering Department as a Senior Engineer working on utility projects remotely.

January 27

Position involves designing overhead and underground distribution projects and compliance with standards.

January 25

Join SimSpace as a Senior Red Team Engineer to simulate sophisticated attacks and strengthen security measures.

Discover 90,000+ Remote Jobs!

Join now to unlock all job opportunities.

Find your dream remote job

Discover hidden jobs

We scan the internet everyday and find jobs not posted on LinkedIn or other job boards.

Head start against the competition

We find jobs within 24 hours of being posted, so you can apply before everyone else.

Be the first to know

Daily emails with new job openings straight to your inbox.

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers

Wall of Love

Frequently asked questions

We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.

Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, you’ll still have access until the end of your current billing period.

Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!

New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.

Yes! We’re always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.

Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.

Why I created Remote Rocketship

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com