December 4
Ansible
Cyber Security
Distributed Systems
ElasticSearch
Grafana
Kafka
Kubernetes
Prometheus
Puppet
Python
SaltStack
Go
• Improve system reliability through our ‘you build it, you run it’ philosophy, such as by adding metrics, building dashboards, and documenting. • Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks. • Develop strong partnerships with other teams to understand and proactively address future technology needs and current pain points. • Increase productivity by building internal tools and automation workflows that reduce maintenance overhead and speed up processes. • Incident response, diagnosis, and follow-up on system outages or alerts across the infrastructure which also includes being part of the on-call rotation.
• Previous experience as a Linux system administrator or DevOps engineer. • Strong analytical and troubleshooting skills, fluency in coding, and high traffic/ high availability system architecture. • You actively take ownership of the systems and processes, find and implement improvements throughout their lifecycle. • Speak English with full professional proficiency or higher. • Proficiency in building and managing Kubernetes clusters. • Expertise in automation tools like Ansible, Puppet, or Saltstack. • Familiarity with Infrastructure-as-Code (IAC) principles. • Strong background in Linux and/or BSD operating systems. • Proficient in at least one programming language, such as Python or Go. • Knowledge of monitoring technologies, including Prometheus and Grafana. • Experience in administrating distributed systems like Kafka and Elasticsearch. • Extensive knowledge of information security, both technical measures and concepts. • Proficiency in building scalable, reliable and high-performance software applications and systems. • Solid grasp of the OSI model, networking concepts, and common protocols like ARP, IP, TCP, UDP, and HTTP. • Understanding of routing protocols like BGP and IS-IS.
Apply Now