Tyk

Website LinkedIn All Job Openings

API Management • API Gateways • Authentication Provider • API Consultancy • Open Source

51 - 200

Senior Site Reliability Engineer

July 24

🇨🇦 Canada – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

DNS

Grafana

Kubernetes

MongoDB

Prometheus

Python

Redis

TCP/IP

Terraform

Apply Now

Tyk

Website LinkedIn All Job Openings

API Management • API Gateways • Authentication Provider • API Consultancy • Open Source

51 - 200

Description

• Collaborate with the Principal SRE to shape and implement the SRE strategic plan. • Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. • Address wellbeing and performance concerns, fostering a positive and productive team environment. • Work with the Principal SRE and Scrum Master to analyse wellbeing survey outcomes and develop improvement plans. • Champion operational communication, ensuring high-quality and timely updates on team progress. • Ensure SLA compliance for our cloud environment through proactive monitoring. • Develop and oversee the roadmap for proactive alerting and monitoring. • Define and track key performance metrics for cloud services, driving continuous improvement. • Design and implement solutions to maintain and enhance KPIs. • Lead performance tuning and fault finding by analysing metrics from operating systems and applications. • Optimise system and infrastructure performance, focusing on innovation and customer needs anticipation. • Engage with commercial teams to understand growth plans and develop corresponding SRE strategies. • Direct the analysis of cloud infrastructure, focusing on automation, scalability, and management. • Align with the Principal SRE on automation strategies for cloud-operations tasks. • Model excellence in software design and automation to enhance Tyk Cloud services, creating runbooks and knowledge sharing. • Conduct blame-free root cause analysis postmortems, reporting findings and recommendations. • Document operational processes and policies, ensuring replicability and adherence. • Provide on-call support, ensuring effective response and resolution in line with SLAs. • Plan and execute software upgrades to optimise cloud services. • Assist commercial teams with data requests and account management. • Champion and adhere to SCRUM methodologies within the SRE team.

Requirements

• Proven experience in a senior SRE role or similar. • Strong knowledge of cloud technologies and SLA SLO SLI management. • Experience leading teams and implementing SCRUM processes. • Excellent communication and leadership skills. • Experience line managing, mentoring and coaching. • Ability to analyze and improve operational processes and performance metrics. • Experience in software design, automation, and root cause analysis. • On-call support experience and customer-focused mindset. • Collaborative attitude with commercial and technical teams. • Launching and operating production Kubernetes clusters. • Designing and operating infrastructure on AWS and other providers. • Operating MongoDB (or other document database) clusters. • Operating Redis (or other key-value storage) clusters. • Administering Linux servers. • Maintaining distributed software. • Operating Prometheus and Grafana. • Operating logging collection and analysis system. • Working hours within 16:00pm – 4:00am UTC.

Benefits

• Everyone has unlimited paid holidays. • Total flexibility in hours. • Employee share scheme • Generous maternity and paternity leave • Volunteering Days • Company retreats • Employee Wellbeing platform

Apply Now