GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
10,000+
October 17
GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
10,000+
• Are you passionate about building and maintaining large-scale production systems that support advanced data science and machine learning applications? • Join a team at the heart of NVIDIA's data-driven decision-making culture. • Design, build, and maintain services enabling real-time data analytics, streaming, and ML/AI. • Implement software and systems engineering practices for high efficiency and availability. • Collaborate with customers for system changes, monitoring capacity, latency, and performance. • Strong background in SRE practices, systems, networking, coding, and cloud operations required. • Work on innovative technologies that power AI and data science.
• Minimum of 5-8 years of experience in SRE, Cloud platforms, or DevOps with large-scale microservices in production environments. • Master's or Bachelor's degree in Computer Science or Electrical Engineering or CE or equivalent experience. • Strong understanding of SRE principles, including error budgets, SLOs, and SLAs. • Proficiency in incident, change, and problem management processes. • Skilled in problem-solving, root cause analysis, and optimization. • Experience with streaming data infrastructure services, such as Kafka and Spark. • Expertise in building and operating large-scale observability platforms for monitoring and logging (e.g., ELK, Prometheus). • Proficiency in programming languages such as Python, Go, Perl, or Ruby. • Hands-on experience with scaling distributed systems in public, private, or hybrid cloud environments. • Experience in deploying, supporting, and supervising services, platforms, and application stacks.
Apply NowOctober 15
51 - 200
Lead DevOps implementation for XA Group's cloud and on-premises environments.
October 3
51 - 200
Senior DevOps Engineer for Ryan's application development team to enhance enterprise applications.
September 26
501 - 1000
Site Reliability Engineer for improving reliability at Solvd Inc., a software company.
September 22
51 - 200
Back-end engineers ensure stable product performance for Insticator's SSP.
🇮🇳 India – Remote
💰 $5.2M Series A on 2017-07
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)