GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
10,000+
3 days ago
🇺🇸 United States – Remote
💵 $220k - $419.8k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming
10,000+
• Lead technical architecture for DGX cloud solutions on cloud service providers like AWS, GCP, Azure and OCI. • Provide fast and creative solutions for complex problems and write effective, clear and reliable architecture specifications. • Design, implement and support operational and reliability aspects of large scale GPU training clusters. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement. • Maintain services after they go live by measuring and monitoring availability, latency and overall system health. • Practice sustainable incident response and blameless postmortems.
• B.Sc./M.Sc./Ph.D. degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience. • 8+ yrs of proven experience. • Experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud systems in Production • Experience in one or more of the following: Python, Go • In depth knowledge on Linux, Networking and Cloud Native Technologies
• Eligible for equity and benefits
Apply Now4 days ago
5001 - 10000
DevOps Engineer for AWS cloud solutions at ICF.
🇺🇸 United States – Remote
💵 $84.5k - $143.7k / year
💰 Grant on 2023-02
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
4 days ago
201 - 500
Lead DevOps team to enhance healthcare data solutions at Arcadia.
🇺🇸 United States – Remote
💰 $29.5M Venture Round on 2020-01
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
4 days ago
201 - 500
Maintain platform reliability for family fintech company Greenlight.
🇺🇸 United States – Remote
💵 $142k - $190k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
4 days ago
51 - 200
Ensure optimal performance of Remo's virtual dementia care platform's infrastructure.
🇺🇸 United States – Remote
💵 $140k - $180k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
4 days ago
501 - 1000
Site Reliability Engineer building cloud platform at iManage, enhancing customer success.
🇺🇸 United States – Remote
💰 Venture Round on 1998-01
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor