Join our Facebook group

👉 Remote Jobs Network

Site Reliability Engineer - Technical Lead

September 12

Apply Now
Logo of Nethermind

Nethermind

Builders & researchers with expertise in Ethereum, Protocol Engineering, L2, DeFi & Smart Contracts Security & Auditing

DeFi • Ethereum • Layer 2 scaling • Blockchain • Protocol engineering

51 - 200

💰 Angel Round on 2020-04

Description

• Lead the implementation and refinement of SRE practices across the organization, including SLOs, error budgets, and blameless postmortems • Design and implement automation to eliminate toil and improve system reliability and efficiency • Lead initiatives and architect scalable hybrid cloud solutions for Web3 infrastructure • Manage error budgets and make data-driven decisions about when to prioritize reliability vs. new features • Drive SRE practices to ensure high availability, performance, and reliability under varying load conditions • Collaborate closely with Platform engineering team to build reliability into services from the ground up • Collaborate closely with Nethermind’s Infrastructure Leadership department to align SRE strategies with overall technical vision • Drive the adoption of observability best practices and implement comprehensive monitoring systems • Develop and maintain service level indicators (SLIs) and objectives (SLOs), working with product owners to define appropriate reliability targets • Mentor team members in SRE practices and foster a culture of continuous learning • Lead capacity planning efforts, using quantitative analysis to predict and address future scaling challenges • Contribute to long-term technical roadmaps, balancing reliability concerns with product innovation

Requirements

• 5+ years of experience in Site Reliability Engineering or DevOps • Expert knowledge of cloud platforms (AWS, GCP) • Expert knowledge of Kubernetes • Proven experience in designing and implementing scalable, efficient, resilient systems • Deep understanding of Linux/Unix systems and networking protocols • Strong programming skills in Python or Go • Strong background in monitoring, observability, and logging systems (e.g., Grafana, Prometheus, Loki) • Expertise in CI/CD tools (e.g. GitHub Actions, ArgoCD) • Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts to various audiences • Experience in producing technical documentation, runbooks, presentations, and post-mortem reports • Experience and passion for mentoring and upskilling team members • Experience leading technical teams • Contributions to open-source projects or thought leadership in SRE • Familiarity with MLOps and big data technologies • Knowledge of blockchain technology and infrastructure • Experience with chaos engineering principles and tools • Familiarity with traffic management and CDN technologies • Systems or backend engineering background

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com