Senior Reliability Engineer

5 hours ago

Apply Now
Logo of Americor

Americor

Debt Resolution • Debt Analysis • Credit Card Debt • Credit Counseling

201 - 500

Description

• Ensure the reliability of infrastructure supporting mission-critical services, minimizing downtime and optimizing performance. • Proactively monitor, respond to, diagnose, and resolve incidents, improving response time and minimizing customer impact. • Work closely with Russian-speaking developers, as well as QA and system analysts. • Enhance CI/CD pipelines, monitoring tools, and automation processes to streamline workflows and increase system efficiency. • Keep infrastructure-related documentation up to date. • Hosted in MS Azure, AWS, but mainly OVHcloud (US) OVHcloud contains Bare Metal and VMs OS: CentOS / AlmaLinux OS. Components: Nginx, KeyDB/Redis, OpenSearch. Database: MariaDB/MySQL, Percona / Galera Cluster, ProxySQL, Maxscale. Storage: GlusterFS. Networking: HAProxy, VyOS, iptables. Language: PHP 8 (PHP-FPM, Yii2, Symfony, Laravel, OPcache). Monitoring tools: Datadog, Vector, Sentry. IaC: Terraform, Ansible. Alerting: OpsGenie.

Requirements

• 5+ years of experience in a Site Reliability Engineering role, with a proven track record of maintaining high-availability infrastructure in a high-load environment. • Expertise in Linux systems and web stacks (Nginx, PHP, MySQL/MariaDB, Redis/KeyDB) to ensure smooth and efficient operation. • Strong experience with MySQL/MariaDB Galera cluster and Gluster storage to optimize data reliability and scalability. • Deep knowledge of network architectures, including TCP/IP, DNS, VPNs, and load-balancing techniques, with hands-on experience in troubleshooting and optimizing network performance. • Proficiency in PHP and Docker for seamless integration and deployment of services. • Solid understanding of CI/CD and security best practices. • Understanding of Infrastructure-as-Code, Monitoring-as-Code, and GitOps (we use Ansible and Terraform). • Experience with Cloudflare and AWS services (EKS, S3, OpenSearch). • Experience building fault-tolerant systems and compliance audits (SOC, FFIEC, etc.). • Familiarity with Jira and Agile software development. • Familiarity with modern container orchestration and deployment tools (Kubernetes, Helm). • Fluent in Russian (reading, writing and speaking).

Benefits

• Ongoing training and development • Opportunity for career advancement • Medical • Dental • Vision • Company Paid Group Life / AD&D Insurance • 7 Paid Holidays and 2 Floating Holiday Days to use at will • Paid Time Off • Flexible Spending/HSA • Employee Assistance Program (EAP) • 401(k) match • Referral Program

Apply Now

Similar Jobs

Yesterday

ScienceLogic

501 - 1000

Join ScienceLogic as a Project Manager to oversee customer platforms and team coordination.

2 days ago

Raft

51 - 200

Join Raft to provide DevSecOps support for customers with digital solutions using Kubernetes and Docker.

2 days ago

Gem

51 - 200

Manage a team for building and operating reliability tools at Zillow. Drive reliability and scalability across engineering ecosystem.

6 days ago

Join JLL as a Reliability Engineer, providing engineering support and implementing asset management plans. Leverage engineering methods to enhance operations while ensuring system reliability.

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com