Senior Site Reliability Engineer, Data Science and ML Platforms

October 17

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+

Description

• Are you passionate about building and maintaining large-scale production systems that support advanced data science and machine learning applications? • Join a team at the heart of NVIDIA's data-driven decision-making culture. • Design, build, and maintain services enabling real-time data analytics, streaming, and ML/AI. • Implement software and systems engineering practices for high efficiency and availability. • Collaborate with customers for system changes, monitoring capacity, latency, and performance. • Strong background in SRE practices, systems, networking, coding, and cloud operations required. • Work on innovative technologies that power AI and data science.

Requirements

• Minimum of 5-8 years of experience in SRE, Cloud platforms, or DevOps with large-scale microservices in production environments. • Master's or Bachelor's degree in Computer Science or Electrical Engineering or CE or equivalent experience. • Strong understanding of SRE principles, including error budgets, SLOs, and SLAs. • Proficiency in incident, change, and problem management processes. • Skilled in problem-solving, root cause analysis, and optimization. • Experience with streaming data infrastructure services, such as Kafka and Spark. • Expertise in building and operating large-scale observability platforms for monitoring and logging (e.g., ELK, Prometheus). • Proficiency in programming languages such as Python, Go, Perl, or Ruby. • Hands-on experience with scaling distributed systems in public, private, or hybrid cloud environments. • Experience in deploying, supporting, and supervising services, platforms, and application stacks.

Apply Now

Similar Jobs

October 15

XA Group

51 - 200

Lead DevOps implementation for XA Group's cloud and on-premises environments.

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 3

tax.com

51 - 200

Senior DevOps Engineer for Ryan's application development team to enhance enterprise applications.

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

September 26

Solvd, Inc.

501 - 1000

Site Reliability Engineer for improving reliability at Solvd Inc., a software company.

🇮🇳 India – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

September 22

Insticator

51 - 200

Back-end engineers ensure stable product performance for Insticator's SSP.

🇮🇳 India – Remote

💰 $5.2M Series A on 2017-07

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com