GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
October 17
GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
• Are you passionate about building and maintaining large-scale production systems that support advanced data science and machine learning applications? • Join a team at the heart of NVIDIA's data-driven decision-making culture. • Design, build, and maintain services enabling real-time data analytics, streaming, and ML/AI. • Implement software and systems engineering practices for high efficiency and availability. • Collaborate with customers for system changes, monitoring capacity, latency, and performance. • Strong background in SRE practices, systems, networking, coding, and cloud operations required. • Work on innovative technologies that power AI and data science.
• Minimum of 5-8 years of experience in SRE, Cloud platforms, or DevOps with large-scale microservices in production environments. • Master's or Bachelor's degree in Computer Science or Electrical Engineering or CE or equivalent experience. • Strong understanding of SRE principles, including error budgets, SLOs, and SLAs. • Proficiency in incident, change, and problem management processes. • Skilled in problem-solving, root cause analysis, and optimization. • Experience with streaming data infrastructure services, such as Kafka and Spark. • Expertise in building and operating large-scale observability platforms for monitoring and logging (e.g., ELK, Prometheus). • Proficiency in programming languages such as Python, Go, Perl, or Ruby. • Hands-on experience with scaling distributed systems in public, private, or hybrid cloud environments. • Experience in deploying, supporting, and supervising services, platforms, and application stacks.
Apply NowSeptember 15
Site Reliability Engineer ensuring reliability for Kyndryl's technology systems.
July 13
🇮🇳 India – Remote
💰 Venture Round on 2007-12
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
June 6
March 15
🇮🇳 India – Remote
💰 $42.8M Series D on 2013-06
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
February 3