Site Reliability Engineer

September 17

Apply Now
Logo of ESL FACEIT Group [EFG]

ESL FACEIT Group [EFG]

esports • games • community • festivals • TV production

501 - 1000 employees

Founded 2018

Description

• Designing, analyzing, and troubleshooting large-scale distributed systems. • Ensuring services and systems are reliable and have fast improvement rates. • Maintaining and improving monitoring and observability tools (Grafana/Prometheus/Thanos/Jaeger). • Working closely with cross-functional teams to help design, maintain and operate systems at scale. • Developing and driving adoption of SRE best practices across the company. • Leading on incident management process and adoption. • Using troubleshooting skills to identify and fix operational issues. • Working with Cloud Native technologies such as Kubernetes, Envoy, Istio, Prometheus and Helm. • Working with the “Hashi Stack” (terraform, packer, vault). • Experimenting with and introducing cutting edge technologies.

Requirements

• Proven experience as a Site Reliability Engineer, DevXP Engineer or Software Engineer, focusing on building and maintaining scalable infrastructures. • Excellent working knowledge on at least one of the major cloud providers (GCP/AWS/Azure). • You have experience with cluster management systems (Kubernetes). • Knowledge of incident management: ability to investigate, troubleshoot, recover and prevent the recurrence of incidents that interfere with the normal delivery of IT services. • Proficient in Go language and some level of proficiency in at least another language: Java, Python, Rust…. • You have knowledge of GitOps practices. • You have production scale experience with one of the following; MongoDB, Redis, MySQL. • Experience contributing to open source technologies would be an added bonus.

Apply Now

Similar Jobs

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com