fraud prevention • device fingerprinting • predictive analysis • cyber intelligence services • email analysis
201 - 500
July 26
fraud prevention • device fingerprinting • predictive analysis • cyber intelligence services • email analysis
201 - 500
• Ensure the reliability, availability, and performance of our systems by implementing SRE best practices • Develop and maintain comprehensive monitoring and alerting systems using tools such as Prometheus, Grafana, ELK stack, etc. Manage incident response and root cause analysis for production issues • Conduct post-incident reviews to learn from failures and drive continuous improvement in the system’s reliability • Continuously monitor and optimize the performance of cloud infrastructure to ensure efficient resource utilization and cost-effectiveness • Automate routine tasks and processes to reduce manual intervention and increase efficiency • Analyze current system capacity and plan for future growth to ensure the infrastructure can scale with increasing demands • Define, measure, and monitor SLOs and SLIs to ensure that services meet their reliability targets • Work closely with engineering, and product teams to provide feedback and suggestions on new architectures, ensuring they meet reliability and performance standards • Develop and maintain comprehensive documentation for architecture, infrastructure, and troubleshooting processes. • Provide on-call support to ensure the continuous availability of our applications and infrastructure • Ensure that systems meet security and compliance requirements, performing regular audits and assessments based on the internal security team’s guidelines • Stay current with new technologies and industry trends, evaluating their potential impact on our infrastructure and reliability practices
• 8+ years of experience as a DevOps Engineer or in a similar software engineering role, with a focus on SRE principles and practices • Ability to quickly troubleshoot complex issues related to system resources or different applications • A proactive approach to identifying and resolving issues independently with a strong problem-solving attitude • Proficiency with Kubernetes, AWS EKS preferred • Expertise with Infrastructure as Code (Terraform) • Extensive experience with high-performance, scalable, multi-region AWS infrastructure. • Strong experience with monitoring and logging tools such as Prometheus, Grafana, Elasticsearch, and Kibana. • Proficiency with incident management tools such as PagerDuty, Opsgenie, or similar platforms to manage on-call schedules and incident response processes effectively • Familiarity with CI/CD pipelines and tools (eg. Github Actions TeamCity) • Excellent communication and collaboration skills to work effectively with cross-functional teams
• Employee stock ownership plan (ESOP) • Flexible hours • Generous Holiday allowance • Access to significant opportunities for learning and development • Private health insurance including dependants (inc. employee assistance & mental health support) • Complimentary weekly language courses • Enhanced Parental leave
Apply NowAugust 4, 2023
5001 - 10000
Ansible
Big Data
CI/CD
Cloud
DevOps
Docker
GitLab
HTML
Java
JavaScript
Jenkins
Jira
JSON
Kubernetes
OpenStack
Perl
Python
SQL
Terraform
TypeScript
VMware
Vue.js