Site Reliability Engineer - SRE

August 9

Apply Now
Logo of StarCompliance

StarCompliance

We are Reputation Guardians, on a mission to make compliance simple and easy.

Insider Trading • Political Activity • Gifts & Entertainment • Outside Activity • Reporting

201 - 500

💰 Venture Round on 2020-12

Description

• Maintain and improve platform's reliability, availability, and performance leveraging Azure as the core cloud platform. • Work closely with cross-functional teams to design, implement, and maintain resilient systems. • Automate wherever possible to streamline operations and minimize downtime. • Proactively identify and resolve potential issues before impacting customers. • Contribute to the continuous improvement of our infrastructure and processes. • Analyze reliability challenges and develop automated solutions for incident resolution. • Work with development teams to improve applications operational features for faster MTTD, MTTR, and auto-recovery. • Lead the establishment of SLIs, SLOs, Error budgets, policies. • Identify, track, and address Toil. • Conduct Post-Mortems. • Identify and implement continuous improvement in various facets of production operations. • Offer advanced technical support for cross-product issues and incidents. • Leveraging SRE tooling to develop, implement, and deliver on the SRE mission. • Conduct Chaos Testing. • Identify, define, and implement new tools and technologies to improve quality and efficiency of distributed platforms. • Drive reliability and supportability aspects of Cloud service, including change management, triage of customer escalations, remediation plans, playbooks, and automation. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. • Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement.

Requirements

• 4+ years of experience in Reliability engineering background • 2+ recent years of experience with Azure systems • Advanced knowledge of New Relic ecosystem. • Working Knowledge of Monitoring and APM tools such as Azure App Insights, Grafana, and Selenium • Knowledge of networking and troubleshooting latency, connectivity, and performance • Experience working with IaC with Terraform and CaC with Ansible. • Familiar with one or more Databases - SQL server, Mongo DB, and PostgreSQL • Hands-on experience with SRE practices and writing, running Chaos engineering experiments. • Preferred experience with C#, .Net, and PowerShell or Python or Golang • Experience with containerization. • Experience in High Availability and distributed systems. • Proficient in Linux and Windows administration, troubleshooting, and support • Experience with Azure DevOps • Excellent Debugging skills across a variety of integrated platforms.

Apply Now

Similar Jobs

August 8

Bitwarden

51 - 200

Own and enhance cloud infrastructure for secure data management solutions.

🇺🇸 United States – Remote

💵 $100k - $140k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

👨🏻‍🔧 Site Reliability Engineer (SRE)

August 8

Virta Health

201 - 500

Build infrastructure for fast-paced diabetes care technology while ensuring security and compliance.

🇺🇸 United States – Remote

💵 $145.9k - $188.4k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

👨🏻‍🔧 Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

August 1

Doxel

51 - 200

Enhance reliability and performance of systems bringing AI to construction.

July 31

Float.com

11 - 50

Increase reliability of services and build robust security systems for scalable infrastructure.

🇺🇸 United States – Remote

💵 $161.5k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

👨🏻‍🔧 Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com