Lead DevOps - SRE

November 8

🇮🇪 Ireland – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of Keelvar

Keelvar

procurement • eAuctions • eSourcing • RFX • optimization

51 - 200

💰 Series B on 2022-05

Description

• Provide strategic leadership in defining, planning, implementing, iterating, and maintaining Keelvar’s cloud infrastructure in AWS, ensuring alignment with the company’s goals and scaling requirements. • Mentor and guide the SRE team in fostering a continuous deployment ecosystem that enables the product engineering teams to release changes to customers quickly, reliably, and sustainably through automated deployment pipelines. • Collaborate closely with engineering leadership and cross-functional teams to drive initiatives that enhance the availability, performance, and resilience of critical services • Lead efforts in infrastructure and application security by partnering with product engineering teams and the security and compliance team to incorporate DevSecOps principles and implement secure infrastructure, enforce defence-in-depth principles, and advocate for best practices in security. • Oversee and enhance production system monitoring to ensure optimal availability, latency, and overall system health, and provide strategic recommendations for improvements • Oversee and prioritise tickets and incoming requests to the SRE team, ensuring timely response to incidents, operational tasks, and support requests from product and engineering teams. • Develop processes to triage, track, and resolve issues efficiently, and maintain clear communication with stakeholders regarding ticket status and resolution timelines. • Develop, test, and evolve disaster recovery plans to ensure business continuity, leading periodic drills to prepare for system failures or disasters. • Drive the design and implementation of monitoring and alerting strategies, ensuring application SLAs and SLOs properly defined, met and exceeded • Identify and lead initiatives for technical, operational, and process improvements, enabling continuous optimization of SRE practices and team efficiency. • Establish and enforce DevOps best practices across the organisation, promoting a culture of continuous integration, continuous delivery, and high system resilience. • Provide expert guidance in load testing, performance profiling, and capacity planning to support product scalability. • Maintain comprehensive documentation for infrastructure, processes, and procedures to facilitate knowledge sharing and onboarding. • Stay at the forefront of industry advancements by actively researching and implementing emerging SRE and cloud computing best practices to improve our processes and infrastructure.

Requirements

• 7+ years of experience in SRE, DevOps with at least 2+ years in a leadership or senior engineering capacity • Proven track record in managing and scaling cloud infrastructure, preferably within AWS, with hands-on expertise in automation, infrastructure as code, and cloud-native architectures. • Strong technical background in CI/CD pipeline development, with experience building and optimising automated deployment pipelines in agile, product-focused environments. • Proficient in DevSecOps principles, with a deep understanding of security best practices in infrastructure and application deployment. • Experienced in managing monitoring, logging, and alerting systems for high-availability applications, with a strong knowledge of defining and meeting SLAs and SLOs. • Advanced skills in infrastructure-as-code tools, such as Pulumi and familiarity with container orchestration platforms like AWS ECS / Kubernetes • Programming skills in one or more languages, such as Python or Shell scripting to support automation and tooling. • Solid understanding of networking, database management, and system performance tuning, with the ability to diagnose and resolve complex issues. • Proven ability to mentor and guide SRE/DevOps teams, fostering a collaborative, continuous-learning environment.

Benefits

• Competitive salary with a Series B backed, fast growing organisation • 25 days holidays increasing to 26 after 3 years and increasing again to 27 after 5 years. • Plus your birthday off on us • Flexible working hours with a positive approach to work - life balance • An inclusive, collaborative, innovative culture • Generous leave offerings including Wellbeing days • Technology that enables you to perform to your best

Apply Now

Similar Jobs

October 30

Site Reliability Engineer for Guidewire's cloud platform and InsuranceSuite products.

🇮🇪 Ireland – Remote

💰 $750k Series C on 2008-03

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 4

CaptivateIQ

201 - 500

Develop infrastructure and reliability solutions for CaptivateIQ's agile commission management platform.

🇮🇪 Ireland – Remote

💵 €96k - €120k / year

💰 $100M Series C on 2022-01

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com