Staff Site Reliability Engineer

4 hours ago

🇪🇺 Anywhere in Europe – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Apply Now
Logo of neptune.ai

neptune.ai

Machine learning • Data science • MLOps • Experiment tracking • Model registry

11 - 50

Description

• We are seeking an experienced Staff Site Reliability Engineer to join our fully remote team. • As a key player in our Engineering team, you will contribute to infrastructure design and optimization. • Have an impact on the scalability, resilience, and performance of Neptune solutions. • Ownership of Site Reliability Process: Own the site reliability process and systems through all stages. • Ensure the scalability, resilience, and performance of Neptune solutions across global SaaS and client-hosted environments. • Design and implement automation workflows to streamline deployments, upgrades, and incident response. • Ensure infrastructure and processes meet security and industry standards, protecting sensitive data. • Partner with development, product, customer success, and client teams to deliver robust solutions. • Document architecture, operational procedures, and troubleshooting guides to enable knowledge sharing. • Participate in on-call rotations to maintain system uptime and performance.

Requirements

• 6+ years in SRE, DevOps, or related roles. • Strong experience managing and optimizing Kubernetes clusters for robust, scalable, and efficient infrastructure. • Proven expertise in designing and implementing automation solutions for infrastructure and application deployment, with experience in Terraform, Helm, and GitLab CI/CD. • Strong programming skills in Shell and Python. • Extensive experience with Linux system administration and network management. • Expertise in managing distributed computing systems and near real-time data streaming platforms. • Fluency in English, with solid communication skills for interacting with global customers. • Nice to have: Experience in security best practices, compliance standards (e.g., SOC 2), and infrastructure hardening. • Nice to have: Experience with multi-cloud architecture and cloud-native technologies. • Nice to have: Experience in high-traffic, petabyte-scale data environments. • Nice to have: Experience with ClickHouse and Kafka deployments.

Benefits

• Flexibility: 100% remote work with offices (co-works) in Warsaw/Wrocław/Poznań/Kraków available and flexible working hours. • Share in our success: Participate in the Employee Stock Option Plan and be part of our growth journey. • Time off: 20 paid service-free days per year. • Ownership and impact: Space to take action, bring your ideas to life, and make a real impact.

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com