October 12
Airflow
Ansible
AWS
Cloud
Docker
DynamoDB
Google Cloud Platform
Grafana
Hadoop
Kubernetes
Node.js
Prometheus
Python
Redis
Spark
Splunk
TCP/IP
Terraform
Unix
Go
• Manage production environment at vast scale and their associated infrastructure and tools • Support the operability roadmap to improve the availability, performance, scalability and efficiency of the services by implementing monitoring, automation, redundancy, capacity and business continuity planning • Work on challenging system and network problems to improve system performance and reliability • Implement and maintain monitoring to provide complete transparency to application and system state, history, trends and insights • Participation in on-call rotation - drive incident resolution, live troubleshooting and impact mitigation
• At least 1 year experience in DevOps / PE / SRE and Software Development roles • At least a year of experience in containerization and orchestration technologies (e.g. Docker, Kubernetes) • Experience working with IaC (eg. Terraform, Ansible) • Experience with using Git to manage code • Experience with building CI/CD pipelines • Good knowledge of TCP/IP and networking • Experience in designing and optimizing GCP Dataproc, Composer (Airflow, EMR on AWS) and bigquery slot management for big-data pipeline orchestration • Experience in designing, managing large scale infrastructure in either AWS EKS, AWS Open search, OR GCP GKE, GCP Observability/Monitoring and with multi zone, multi region deployments • Experience working with GitHub Actions • Experience with system reliability tools like Open telemetry, Prometheus, Splunk, ELK,Grafana • Experience with storage solutions like REDIS, DynamoDB • Some experience with the Hadoop ecosystem comprising Oozie, Pig, Spark etc.
Apply Now