Site Reliability Engineer

Yesterday

Apply Now
Logo of Reach Digital Health

Reach Digital Health

technology • international development • public health • Africa • digital services

51 - 200

Description

• Apply software engineering principles and practices to infrastructure and operations problems. • Collaborate with engineering teams and stakeholders to deliver high-quality products. • Maintain and improve the reliability and performance of our systems. • Design and develop automation tools and scripts for infrastructure and operations. • Work with data security and legal teams to ensure compliance with data privacy regulations. • Provide support with issue investigation and recovery procedures.

Requirements

• Proficient in one or more programming languages, such as Python, Go, Java, or C++. • Proficient in one or more scripting languages, such as Bash, Perl, or Ruby. • Proficient in one or more cloud platforms, such as AWS, Azure, or GCP. • Proficient in one or more UNIX-like operating systems. • Proficient in one or more configuration management and deployment tools, such as Ansible, Chef, Puppet, or Terraform. • Proficient in one or more monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk. • Proficient in one or more container and orchestration tools, such as Docker, Kubernetes. • Proficient in one or more web servers and proxies, such as Apache, Nginx, or Envoy. • Proficient in one or more databases and data stores, such as MySQL, PostgreSQL, MongoDB, or Redis. • Proficient in one or more version control and collaboration tools, such as Git. • Knowledgeable in the concepts and principles of site reliability engineering, such as SLIs, SLOs, error budgets, incident management, postmortems, and blameless culture. • Knowledgeable in the concepts and principles of software engineering, such as design patterns, code quality, testing, debugging, and documentation. • Knowledgeable in the concepts and principles of performance engineering, such as profiling, benchmarking, load testing, and capacity planning. • Knowledgeable in the concepts and principles of distributed computing, such as concurrency, parallelism, synchronisation, and consensus. • Excellent communication and collaboration skills, and ability to work effectively in a cross-functional and remote team environment. • Excellent problem-solving and analytical skills, and ability to troubleshoot and resolve complex issues in a timely and efficient manner. • Excellent learning and innovation skills, and ability to research and evaluate new technologies and methodologies.

Apply Now

Similar Jobs

October 1

Altruistiq

11 - 50

DevOps Engineer for sustainability OS supporting global consumer companies.

🇿🇦 South Africa – Remote

💰 Seed Round on 2022-07

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com