Site Reliability Engineer, LatAM

January 30

Apply Now

Description

Sporty's sites are some of the most popular on the internet, consistently staying in Alexa's list of top websites for the countries they operate in. In addition to our DevOps Team, we are building a Site Reliability Team whose purpose is to focus on site reliability and security. It will also involve deployment, configuration, and monitoring, as well as the availability, latency, change management, emergency response, and capacity management of services in production. • Work with a team of DevOps/SRE and DBA professionals • Improve existing infrastructure and processes currently deployed in as well as streamlining processes deployed to new countries in the future • Holistically improve all aspects of our current infrastructure including: reducing costs; streamlining environment provisioning; lowering response times and incorporating the latest techniques and technologies • Monitor and maintain the existing cloud infrastructure via auto scaling, automated alerts, and OpsWork and Grafana dashboards • Take ownership and responsibility for our cloud operation activities • Liaise with external security agencies for annual audits as well as perform our own internal security sweeps • Aid in reconfiguring existing architecture to allow for rapid deployments to new countries • Mentoring less experienced team members

Requirements

• 4+ years SRE/DevOps experience • Based in Latin America • Experience independently leading the planning and deployment of a project • Experienced with cloud platforms, especially AWS, including solid knowledge of how to utilize cloud resources to fulfill the demand from other teams and production • Familiar with one program language or script language (Python, Java...) • Experience managing multiple Kubernetes clusters in production (virtualization, orchestration, scalability, security, and high availability), skillset such as Helm, Rancher, ArgoCD • Solid networking protocol and cybersecurity knowledge, especially the TCP / IP stack and HTTP protocol • A strong understanding of cache, including CDN, HTTP cache (CloudFlare, AWS CloudFront) • Experienced with CloudNative Monitoring solution in Large distributed system using observation model (Trace, Metric, Logging), skillset such as Prometheus, Jaeger, Loki, ELK, Grafana • Excellent troubleshooting skills, including Linux OS issue diagnosis and OS parameter optimization Beneficial Experience: • Working with other cloud platforms (GCP, Azure, AliCloud) • Familiarity with at least one of infrastructure as Code (Terraform, Cloudformation) • Design and implement CI/CD workflow (Jenkins, Github Action) • Experience with system automation tools (Ansible, Salt, Chef) • Understanding of modern Micro Services and Service Mesh concepts (Containers, Istio)

Benefits

• Quarterly and flash bonuses • Core hours of 10am-3pm in a local timezone • Flexible working hours • Education allowance • Referral bonuses • 28 days paid annual leave • 2 x annual company retreats • Highly talented, dependable co-workers • Payment via world-class online wallet system DEEL • Top-of-the-line equipment supplied by market leader Hofy • 100% score on The Joel Test • Small team size for impact • Global, multicultural organization • Stability and security

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com