Site Reliability Engineer

November 1

🇮🇳 India – Remote

⏰ Full Time

🟢 Junior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of Catchpoint

Catchpoint

Digital Experience Monitoring • Observability • User Experience Observability • Network Observability • Application Observability

201 - 500

Description

• Who monitors the monitoring system? A Site Reliability Engineer at Catchpoint is responsible for supporting the systems that run Catchpoint’s global monitoring platform. In this role, you will interact directly with operations and development teams on building and automating infrastructure (IaC) deployment at scale, then monitoring it to ensure Catchpoint has a scalable and highly reliable system for our customers. • What will success look like in this position? The role requires an operational mindset and a love of solving problems on a global scale with solutions that ensure high reliability and availability. You’ll be exploring and making sense of systems telemetry, logs, passive monitoring and using our own synthetic monitors to create an automation that controls, rolls out, and maintains our platform. • Responsibilities include defining and refining the whole service lifecycle, measuring and monitoring availability, latency, overall system health, designing logging and telemetry systems, automating manual operational work, troubleshooting priority incidents, identifying application patterns for better service objectives, and supporting production systems on an on-call rotation.

Requirements

• Strong Experience/knowledge of administering application servers, web servers, and databases. • Familiarity with Infrastructure Automation, configuration management and CI/CD tools (preferably terraform) • Experience with multiple cloud platforms (AWS, GCP, Azure) • Good networking knowledge and experience with Internet Architecture (BGP, peering, DNS). • 2+ years of incident resolution experience in a large-scale operations environment. • Hands-on experience with cloud deployment, monitoring, and ops analysis tools such as Prometheus, Elasticsearch, Grafana, Kibana, Splunk, Terraform, Jenkins, etc. • 3+ years programming experience with python, bash, PowerShell, C, etc. • Virtualization experience required. • BS degree in Computer Science or related technical field involving coding or equivalent practical experience. • Appreciation of the value of diversity of opinions

Apply Now

Similar Jobs

October 30

KodeKloud

51 - 200

Create and maintain lab training systems for KodeKloud's DevOps courses.

🇮🇳 India – Remote

💵 ₹800k - ₹1M / year

⏰ Full Time

🟢 Junior

⛑ DevOps & Site Reliability Engineer (SRE)

October 26

Chainguard

51 - 200

Maintaining Linux distribution and Chainguard container images at Chainguard.

🇮🇳 India – Remote

💵 $100k - $110k / year

⏰ Full Time

🟢 Junior

🟡 Mid-level

⛑ DevOps & Site Reliability Engineer (SRE)

October 25

Manage IT infrastructure at Token Metrics using AWS and multi-cloud expertise.

🇮🇳 India – Remote

⏰ Full Time

🟢 Junior

⛑ DevOps & Site Reliability Engineer (SRE)

October 9

Granicus

501 - 1000

Join Granicus as an SRE to enhance service reliability and performance.

🇮🇳 India – Remote

⏰ Full Time

🟢 Junior

⛑ DevOps & Site Reliability Engineer (SRE)

September 15

Engineering support for network systems at NextGen Healthcare.

🇮🇳 India – Remote

💰 Venture Round on 2015-02

⏰ Full Time

🟢 Junior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com