Principal Site Reliability Engineer

February 8

Apply Now
Logo of Groupon

Groupon

Groupon is a platform that connects businesses with customers by offering deals and discounts on various services and products. It aims to revolutionize the way businesses market to and engage with their audience, emphasizing innovation, technology, and performance-driven culture. Groupon promotes a work environment that values agility, merit, and direct communication, fostering a culture where employees can make impactful decisions and significantly contribute to the company's transformation.

technology • local commerce • social media • marketing • community

1001 - 5000 employees

Founded 2008

🛍️ eCommerce

🏪 Marketplace

👥 B2C

📋 Description

• Are you ready to take your expertise to the next level and make a meaningful impact on the reliability and scalability of mission-critical systems? • As a Principal Site Reliability Engineer (SRE Level V/VI), you will play a central role in ensuring the performance, availability, and resilience of our platforms. • In this position, you will go beyond maintaining systems by leading initiatives that redefine operational excellence. • You will collaborate with diverse teams to implement cutting-edge technologies and best practices, foster a culture of reliability, and mentor others in their growth as engineers. • This is an exceptional opportunity for someone passionate about solving complex challenges and shaping the future of platform reliability in a high-impact role. • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher. • Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools. • Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery. • Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack. • Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs. • Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues. • Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads. • Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency. • Mentor junior engineers, fostering a collaborative and growth-oriented team environment. • Guide architectural decisions that drive innovation and enhance system reliability.

🎯 Requirements

• 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles. • Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker). • Proficiency in programming and scripting languages like Python, Go, and Bash. • Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible. • Deep understanding of networking, DNS, load balancing, and security principles. • Proven track record of managing high-availability systems in demanding environments. • Exceptional analytical and problem-solving skills. • Certifications in cloud or container technologies (e.g., AWS/GCP/Azure, Kubernetes CKA). • Experience in industries like eCommerce, FinTech, or SaaS. • Familiarity with Agile development processes and frameworks.

🏖️ Benefits

• The opportunity to work with cutting-edge technologies in a transformative environment. • A collaborative and innovative work culture that values your expertise and contributions. • Professional growth and leadership development pathways tailored to your aspirations. • A chance to leave a lasting impact by shaping the future of reliable and scalable systems.

Apply Now

Discover 100,000+ Remote Jobs!

Join now to unlock all jobs

Discover hidden jobs

We scan the internet everyday and find jobs not posted on LinkedIn or other job boards.

Head start against the competition

We find jobs within 24 hours of being posted, so you can apply before everyone else.

Be the first to know

Daily emails with new job openings straight to your inbox.

Choose your membership

Loved by 10,000+ remote workers
🎉$6 / week

Cancel anytime

MOST POPULAR
🥳$18 / month
$24
Save 25% vs weekly

Cancel anytime

BEST VALUE
🥰$54 / year
$216
Save 75% vs monthly

Cancel anytime

Wall of Love

Frequently asked questions

We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.

Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, you’ll still have access until the end of your current billing period.

Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!

New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.

Yes! We’re always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.

Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.

Why I created Remote Rocketship

Choose your membership

Loved by 10,000+ remote workers
🎉$6 / week

Cancel anytime

MOST POPULAR
🥳$18 / month
$24
Save 25% vs weekly

Cancel anytime

BEST VALUE
🥰$54 / year
$216
Save 75% vs monthly

Cancel anytime

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com