Senior Cloud Architect - SRE

3 days ago

🇺🇸 United States – Remote

💵 $220k - $419.8k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of NVIDIA

NVIDIA

GPU-accelerated computing • artificial intelligence • deep learning • virtual reality • gaming

10,000+

Description

• Lead technical architecture for DGX cloud solutions on cloud service providers like AWS, GCP, Azure and OCI. • Provide fast and creative solutions for complex problems and write effective, clear and reliable architecture specifications. • Design, implement and support operational and reliability aspects of large scale GPU training clusters. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement. • Maintain services after they go live by measuring and monitoring availability, latency and overall system health. • Practice sustainable incident response and blameless postmortems.

Requirements

• B.Sc./M.Sc./Ph.D. degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience. • 8+ yrs of proven experience. • Experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud systems in Production • Experience in one or more of the following: Python, Go • In depth knowledge on Linux, Networking and Cloud Native Technologies

Benefits

• Eligible for equity and benefits

Apply Now

Similar Jobs

4 days ago

ICF

5001 - 10000

DevOps Engineer for AWS cloud solutions at ICF.

🇺🇸 United States – Remote

💵 $84.5k - $143.7k / year

💰 Grant on 2023-02

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

4 days ago

Arcadia

201 - 500

Lead DevOps team to enhance healthcare data solutions at Arcadia.

🇺🇸 United States – Remote

💰 $29.5M Venture Round on 2020-01

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

4 days ago

Greenlight

201 - 500

Maintain platform reliability for family fintech company Greenlight.

🇺🇸 United States – Remote

💵 $142k - $190k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

4 days ago

Remo

51 - 200

Ensure optimal performance of Remo's virtual dementia care platform's infrastructure.

🇺🇸 United States – Remote

💵 $140k - $180k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

4 days ago

iManage

501 - 1000

Site Reliability Engineer building cloud platform at iManage, enhancing customer success.

🇺🇸 United States – Remote

💰 Venture Round on 1998-01

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com