Reach Digital Health

Website LinkedIn All Job Openings

technology • international development • public health • Africa • digital services

51 - 200

Site Reliability Engineer

Yesterday

🇿🇦 South Africa – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

Apache

AWS

Azure

Bash

Chef

Cloud

Docker

Google Cloud Platform

Grafana

Java

Kubernetes

MongoDB

MySQL

NGINX

Open Source

Perl

Postgres

Prometheus

Puppet

Python

Redis

Ruby

Splunk

Terraform

Unix

Apply Now

Reach Digital Health

Website LinkedIn All Job Openings

technology • international development • public health • Africa • digital services

51 - 200

Description

• Apply software engineering principles and practices to infrastructure and operations problems. • Collaborate with engineering teams and stakeholders to deliver high-quality products. • Maintain and improve the reliability and performance of our systems. • Design and develop automation tools and scripts for infrastructure and operations. • Work with data security and legal teams to ensure compliance with data privacy regulations. • Provide support with issue investigation and recovery procedures.

Requirements

• Proficient in one or more programming languages, such as Python, Go, Java, or C++. • Proficient in one or more scripting languages, such as Bash, Perl, or Ruby. • Proficient in one or more cloud platforms, such as AWS, Azure, or GCP. • Proficient in one or more UNIX-like operating systems. • Proficient in one or more configuration management and deployment tools, such as Ansible, Chef, Puppet, or Terraform. • Proficient in one or more monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk. • Proficient in one or more container and orchestration tools, such as Docker, Kubernetes. • Proficient in one or more web servers and proxies, such as Apache, Nginx, or Envoy. • Proficient in one or more databases and data stores, such as MySQL, PostgreSQL, MongoDB, or Redis. • Proficient in one or more version control and collaboration tools, such as Git. • Knowledgeable in the concepts and principles of site reliability engineering, such as SLIs, SLOs, error budgets, incident management, postmortems, and blameless culture. • Knowledgeable in the concepts and principles of software engineering, such as design patterns, code quality, testing, debugging, and documentation. • Knowledgeable in the concepts and principles of performance engineering, such as profiling, benchmarking, load testing, and capacity planning. • Knowledgeable in the concepts and principles of distributed computing, such as concurrency, parallelism, synchronisation, and consensus. • Excellent communication and collaboration skills, and ability to work effectively in a cross-functional and remote team environment. • Excellent problem-solving and analytical skills, and ability to troubleshoot and resolve complex issues in a timely and efficient manner. • Excellent learning and innovation skills, and ability to research and evaluate new technologies and methodologies.

Apply Now