Senior Site Reliability Engineer, Data Science and ML Platforms

5 days ago

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+

Description

• Are you passionate about building and maintaining large-scale production systems that support advanced data science and machine learning applications? • Join a team at the heart of NVIDIA's data-driven decision-making culture. • Design, build, and maintain services enabling real-time data analytics, streaming, and ML/AI. • Implement software and systems engineering practices for high efficiency and availability. • Collaborate with customers for system changes, monitoring capacity, latency, and performance. • Strong background in SRE practices, systems, networking, coding, and cloud operations required. • Work on innovative technologies that power AI and data science.

Requirements

• Minimum of 5-8 years of experience in SRE, Cloud platforms, or DevOps with large-scale microservices in production environments. • Master's or Bachelor's degree in Computer Science or Electrical Engineering or CE or equivalent experience. • Strong understanding of SRE principles, including error budgets, SLOs, and SLAs. • Proficiency in incident, change, and problem management processes. • Skilled in problem-solving, root cause analysis, and optimization. • Experience with streaming data infrastructure services, such as Kafka and Spark. • Expertise in building and operating large-scale observability platforms for monitoring and logging (e.g., ELK, Prometheus). • Proficiency in programming languages such as Python, Go, Perl, or Ruby. • Hands-on experience with scaling distributed systems in public, private, or hybrid cloud environments. • Experience in deploying, supporting, and supervising services, platforms, and application stacks.

Apply Now

Similar Jobs

October 15

XA Group

51 - 200

Lead DevOps implementation for XA Group's cloud and on-premises environments.

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 11

Yahoo

10,000+

Senior SRE leading cloud projects at Yahoo, ensuring infrastructure reliability.

🇮🇳 India – Remote

💰 $4.8M Series B on 1995-11

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 3

tax.com

51 - 200

Senior DevOps Engineer for Ryan's application development team to enhance enterprise applications.

🇮🇳 India – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

September 30

Newfold Digital

1001 - 5000

Newfold Digital seeks Senior Dev Ops Engineer to enhance cloud hosting platforms.

🇮🇳 India – Remote

💰 Venture Round on 2021-01

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com