SRE - HPC Engineer

2 days ago

Apply Now
Logo of FluidStack

FluidStack

GPU • GPU Cloud • Cloud Computing • GPUaaS • AI Cloud

11 - 50

Description

• Fluidstack is an AI cloud. • We work with many of the top AI companies on the planet. • Our HPC Engineers make sure our GPU infrastructure is working at peak performance. • You will have three main responsibilities: Deployment, Automation, Support. • You will help take bare-metal servers and deploy them for our customers as high performance compute as a service. • You will help us to automate many of our processes and systems to scale. • You will work closely with our customers to make sure that they are able to utilize our infrastructure.

Requirements

• Experience with HPC systems, System Administration, SRE, or DevOps • Experience with large scale workloads utilizing orchestrators like Slurm or Kubernetes. • Experience with automation of bare-metal machines and containers, using tools such as Ansible, Bash, or Python. • Experience with shared storage on platforms such as NFS, DDN ,Vast, CephFS, etc. • Experience provisioning large scale clusters and networks with e.g. BCM, UFM • Experience with large-scale GPU systems, working with Nvidia GPUs and Infiniband networks. • Fast learner, adaptable, and passionate about Fluidstack’s mission!

Apply Now

Similar Jobs

3 days ago

Zingtree

11 - 50

Mid-Level DevOps Engineer at Zingtree automating processes and supporting customer experience operations.

🇺🇸 United States – Remote

💰 $15M Series A on 2022-01

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

3 days ago

Cloud Operations Engineer improving incident command and monitoring at Lumin Digital.

🇺🇸 United States – Remote

💵 $100k - $125k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com