Senior DevOps Engineer

July 26

🇭🇺 Hungary – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of SEON

SEON

fraud prevention • device fingerprinting • predictive analysis • cyber intelligence services • email analysis

201 - 500

Description

• Ensure the reliability, availability, and performance of our systems by implementing SRE best practices • Develop and maintain comprehensive monitoring and alerting systems using tools such as Prometheus, Grafana, ELK stack, etc. Manage incident response and root cause analysis for production issues • Conduct post-incident reviews to learn from failures and drive continuous improvement in the system’s reliability • Continuously monitor and optimize the performance of cloud infrastructure to ensure efficient resource utilization and cost-effectiveness • Automate routine tasks and processes to reduce manual intervention and increase efficiency • Analyze current system capacity and plan for future growth to ensure the infrastructure can scale with increasing demands • Define, measure, and monitor SLOs and SLIs to ensure that services meet their reliability targets • Work closely with engineering, and product teams to provide feedback and suggestions on new architectures, ensuring they meet reliability and performance standards • Develop and maintain comprehensive documentation for architecture, infrastructure, and troubleshooting processes. • Provide on-call support to ensure the continuous availability of our applications and infrastructure • Ensure that systems meet security and compliance requirements, performing regular audits and assessments based on the internal security team’s guidelines • Stay current with new technologies and industry trends, evaluating their potential impact on our infrastructure and reliability practices

Requirements

• 8+ years of experience as a DevOps Engineer or in a similar software engineering role, with a focus on SRE principles and practices • Ability to quickly troubleshoot complex issues related to system resources or different applications • A proactive approach to identifying and resolving issues independently with a strong problem-solving attitude • Proficiency with Kubernetes, AWS EKS preferred • Expertise with Infrastructure as Code (Terraform) • Extensive experience with high-performance, scalable, multi-region AWS infrastructure. • Strong experience with monitoring and logging tools such as Prometheus, Grafana, Elasticsearch, and Kibana. • Proficiency with incident management tools such as PagerDuty, Opsgenie, or similar platforms to manage on-call schedules and incident response processes effectively • Familiarity with CI/CD pipelines and tools (eg. Github Actions TeamCity) • Excellent communication and collaboration skills to work effectively with cross-functional teams

Benefits

• Employee stock ownership plan (ESOP) • Flexible hours • Generous Holiday allowance • Access to significant opportunities for learning and development • Private health insurance including dependants (inc. employee assistance & mental health support) • Complimentary weekly language courses • Enhanced Parental leave

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com