Site Reliability Engineer

February 28

Apply Now
Logo of Ververica | Original creators of Apache Flink®

Ververica | Original creators of Apache Flink®

Ververica is the original creator of Apache Flink® and provides a Unified Streaming Data Platform powered by its VERA Engine. The platform allows organizations to connect, process, analyze, and govern their data to make better and faster business decisions. Ververica offers flexible deployment options, including self-managed services, fully managed cloud services, and a unique Bring Your Own Cloud (BYOC) model, empowering businesses to leverage real-time data for various use cases such as fraud detection, dynamic pricing, and AI-driven insights. Ververica is dedicated to maximizing performance and efficiency in data processing, making significant impacts for clients across multiple industries.

stream processing • software • cloud • data stream processing • enterprise

51 - 200 employees

Founded 2014

🤖 Artificial Intelligence

☁️ SaaS

💰 Series A on 2016-03

📋 Description

•About Ververica •Ververica, founded by the original creators of Apache Flink™, empowers businesses to unlock the full potential of real-time data processing and analytics. Our platform provides cutting-edge stream processing and event-driven applications, enabling companies worldwide to build scalable and reliable data-driven solutions. •Role Overview •As a Site Reliability Engineer (SRE) at Ververica, you will design, provision, and maintain the infrastructure for Ververica’s Unified Streaming Data Platform across multiple cloud providers, including AWS, GCP, and Azure. You will collaborate with software engineering teams to develop solutions that enhance feature delivery, optimize performance, and address security vulnerabilities. Your role will involve architectural improvements, implementation ownership, and driving reliability best practices. •Key Responsibilities •Build and maintain the infrastructure for Ververica’s Unified Streaming Data Platform across AWS, GCP, and Azure. •Design and manage Infrastructure as Code (IaC) using Terraform, ensuring modularity, reusability, and best practices. •Implement and enhance observability tooling, including Grafana, Prometheus, logging systems, traces, metrics, dashboards, and alerts. •Ensure system reliability through SRE best practices, including defining SLIs, SLOs, and error budgets. •Improve infrastructure architecture and engineering efficiency through continuous evaluation and optimization. •Enhance CI/CD pipelines to automate development workflows. •Monitor, identify, and resolve security vulnerabilities (CVE updates and security enhancements). •Contribute to the successful development and launch of new products, features, and services. •Periodically participate in on-call rotations to manage incidents in a 24/7 live infrastructure. •Maintain and update documentation, including architectural designs and changes.

🎯 Requirements

•Bachelor’s degree in Computer Science, Information Technology, or a related field. •Minimum 2 years of hands-on experience with Kubernetes clusters, Helm charts, controllers, and operators. •Proficiency in designing and maintaining Terraform code with best practices. •Strong knowledge of observability tools and practices, including metrics, logging, and alerting systems. •Experience implementing SRE principles such as SLIs, SLOs, and error budgets. •Solid understanding of Linux systems and networking in cloud environments. •Hands-on experience managing multiple Kubernetes clusters. •Familiarity with distributed systems or streaming data platforms. •Knowledge of cloud-native security best practices.

Apply Now

Discover 100,000+ Remote Jobs!

Join now to unlock all jobs

Discover hidden jobs

We scan the internet everyday and find jobs not posted on LinkedIn or other job boards.

Head start against the competition

We find jobs as soon as they're posted, so you can apply before everyone else.

Be the first to know

Daily emails with new job openings straight to your inbox.

Choose your membership

Loved by 10,000+ remote workers
🎉$6 / week

Cancel anytime

MOST POPULAR
🥳$18 / month
$24
Save 25% vs weekly

Cancel anytime

BEST VALUE
🥰$54 / year
$216
Save 75% vs monthly

Cancel anytime

Wall of Love

Frequently asked questions

We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.

Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, you’ll still have access until the end of your current billing period.

Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!

New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.

Yes! We’re always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.

Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.

Why I created Remote Rocketship

Choose your membership

Loved by 10,000+ remote workers
🎉$6 / week

Cancel anytime

MOST POPULAR
🥳$18 / month
$24
Save 25% vs weekly

Cancel anytime

BEST VALUE
🥰$54 / year
$216
Save 75% vs monthly

Cancel anytime

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com