Digital Experience Monitoring • Observability • User Experience Observability • Network Observability • Application Observability
201 - 500
November 1
AWS
Azure
Backbone
Bash
Cloud
DNS
ElasticSearch
Google Cloud Platform
Grafana
Jenkins
Oracle
Prometheus
Python
Splunk
Terraform
Go
Digital Experience Monitoring • Observability • User Experience Observability • Network Observability • Application Observability
201 - 500
• Who monitors the monitoring system? A Site Reliability Engineer at Catchpoint is responsible for supporting the systems that run Catchpoint’s global monitoring platform. In this role, you will interact directly with operations and development teams on building and automating infrastructure (IaC) deployment at scale, then monitoring it to ensure Catchpoint has a scalable and highly reliable system for our customers. • What will success look like in this position? The role requires an operational mindset and a love of solving problems on a global scale with solutions that ensure high reliability and availability. You’ll be exploring and making sense of systems telemetry, logs, passive monitoring and using our own synthetic monitors to create an automation that controls, rolls out, and maintains our platform. • Responsibilities include defining and refining the whole service lifecycle, measuring and monitoring availability, latency, overall system health, designing logging and telemetry systems, automating manual operational work, troubleshooting priority incidents, identifying application patterns for better service objectives, and supporting production systems on an on-call rotation.
• Strong Experience/knowledge of administering application servers, web servers, and databases. • Familiarity with Infrastructure Automation, configuration management and CI/CD tools (preferably terraform) • Experience with multiple cloud platforms (AWS, GCP, Azure) • Good networking knowledge and experience with Internet Architecture (BGP, peering, DNS). • 2+ years of incident resolution experience in a large-scale operations environment. • Hands-on experience with cloud deployment, monitoring, and ops analysis tools such as Prometheus, Elasticsearch, Grafana, Kibana, Splunk, Terraform, Jenkins, etc. • 3+ years programming experience with python, bash, PowerShell, C, etc. • Virtualization experience required. • BS degree in Computer Science or related technical field involving coding or equivalent practical experience. • Appreciation of the value of diversity of opinions
Apply NowOctober 26
51 - 200
Maintaining Linux distribution and Chainguard container images at Chainguard.
🇮🇳 India – Remote
💵 $100k - $110k / year
⏰ Full Time
🟢 Junior
🟡 Mid-level
⛑ DevOps & Site Reliability Engineer (SRE)
October 25
11 - 50
Manage IT infrastructure at Token Metrics using AWS and multi-cloud expertise.
September 15
1001 - 5000
Engineering support for network systems at NextGen Healthcare.
🇮🇳 India – Remote
💰 Venture Round on 2015-02
⏰ Full Time
🟢 Junior
⛑ DevOps & Site Reliability Engineer (SRE)