RunPod is a cloud-based platform designed to facilitate the training, fine-tuning, and deployment of AI models. It provides a globally distributed GPU cloud that enables users to seamlessly deploy their AI workloads while focusing on building machine learning applications. With features like fast pod spinning, autoscaling, and support for multiple machine learning frameworks, RunPod caters to startups, academic institutions, and enterprises alike, offering a powerful and cost-effective solution for machine learning development.
Machine Learning β’ Artificial Intelligence β’ Deep Learning
51 - 200 employees
Founded 2022
π€ Artificial Intelligence
βοΈ SaaS
π₯ Funding within the last year
π° Seed Round on 2024-05
March 28
πΊπΈ United States β Remote
π΅ $180k - $210k / year
β° Full Time
π‘ Mid-level
π Senior
β DevOps & Site Reliability Engineer (SRE)
RunPod is a cloud-based platform designed to facilitate the training, fine-tuning, and deployment of AI models. It provides a globally distributed GPU cloud that enables users to seamlessly deploy their AI workloads while focusing on building machine learning applications. With features like fast pod spinning, autoscaling, and support for multiple machine learning frameworks, RunPod caters to startups, academic institutions, and enterprises alike, offering a powerful and cost-effective solution for machine learning development.
Machine Learning β’ Artificial Intelligence β’ Deep Learning
51 - 200 employees
Founded 2022
π€ Artificial Intelligence
βοΈ SaaS
π₯ Funding within the last year
π° Seed Round on 2024-05
β’ RunPod is pioneering the future of AI and machine learning, offering cutting-edge cloud infrastructure for full-stack AI applications. β’ We are seeking an experienced and visionary Site Reliability Engineering (SRE) Manager to lead and mentor our team of highly skilled Site Reliability Engineers. β’ As the SRE Manager, you will be responsible for overseeing the design, implementation, and maintenance of our large-scale, distributed systems across multiple data centers. β’ You will lead a team that manages our critical infrastructure, including our GPU/AI technologies, and ensure the continuous improvement of our systems' reliability, performance, and security. β’ Our SRE Philosophy prioritizes automation, systems thinking, continuous improvement, proactive problem solving, and scalability through code. β’ If you are passionate about leading a team of top-tier SREs, driving technical excellence, and solving complex infrastructure challenges at scale, we want to hear from you.
β’ 5+ years of experience in Site Reliability Engineering or a similar role β’ 3+ years of experience in a technical leadership or management position β’ Deep understanding of Linux systems, containerization, virtualization, and networking technologies β’ Strong background in managing and monitoring large-scale distributed systems and bare-metal fleets β’ Expertise in infrastructure-as-code and configuration management tools β’ Proficiency in at least one programming language, preferably Python or Golang β’ Experience with cloud platforms (AWS, GCP, Azure) and their respective services β’ Strong knowledge of monitoring, observability, and alerting systems β’ Excellent problem-solving skills and ability to manage complex, large-scale incidents β’ Proven track record of implementing and managing SLIs, SLOs, and SLAs β’ Strong communication skills with the ability to convey technical concepts to both technical and non-technical stakeholders β’ Successful completion of a background check β’ Bachelor's or Master's degree in Computer Science, Engineering, or a related field (Preferred)
β’ The competitive base pay for this position ranges from $180,000 - $210,000. Factors that may be used to determine your actual pay may include your specific job related knowledge, skills and experience. β’ Stock options β’ The flexibility of remote work with an inclusive, collaborative team. β’ An opportunity to grow with a company that values innovation and user-centric design. β’ Generous vacation policy to ensure work-life harmony and well-being. β’ Contribute to a company with a global impact based in the US, Canada, and Europe.
Apply NowMarch 28
πΊπΈ United States β Remote
π΅ $80k - $102k / year
π° Private Equity Round on 2020-07
β° Full Time
π‘ Mid-level
π Senior
β DevOps & Site Reliability Engineer (SRE)
π¦ H1B Visa Sponsor
March 28
Trilogy Federal seeks a DevOps Engineer/Security Specialist for cybersecurity and DevOps engineering expertise. Support for Department of Veterans Affairs in cloud-based solutions and DevSecOps integration.
March 28
Join Leidos as a Site Reliability Engineer to optimize CI/CD pipelines and cloud infrastructure.
March 28
Join Alset as a DevOps Engineer optimizing cloud infrastructure and collaborating with a skilled team.
March 27
Provide consulting services to VA as a DevOps Coach focusing on Agile and DevSecOps practices.
Discover 100,000+ Remote Jobs!
We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.
Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, youβll still have access until the end of your current billing period.
Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!
New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.
Yes! Weβre always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.
Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.