5 hours ago
Ansible
AWS
Azure
Cloud
Distributed Systems
DNS
Docker
HAProxy
Kubernetes
Laravel
MariaDB
MySQL
NGINX
PHP
Redis
Symfony
TCP/IP
Terraform
• Ensure the reliability of infrastructure supporting mission-critical services, minimizing downtime and optimizing performance. • Proactively monitor, respond to, diagnose, and resolve incidents, improving response time and minimizing customer impact. • Work closely with Russian-speaking developers, as well as QA and system analysts. • Enhance CI/CD pipelines, monitoring tools, and automation processes to streamline workflows and increase system efficiency. • Keep infrastructure-related documentation up to date. • Hosted in MS Azure, AWS, but mainly OVHcloud (US) OVHcloud contains Bare Metal and VMs OS: CentOS / AlmaLinux OS. Components: Nginx, KeyDB/Redis, OpenSearch. Database: MariaDB/MySQL, Percona / Galera Cluster, ProxySQL, Maxscale. Storage: GlusterFS. Networking: HAProxy, VyOS, iptables. Language: PHP 8 (PHP-FPM, Yii2, Symfony, Laravel, OPcache). Monitoring tools: Datadog, Vector, Sentry. IaC: Terraform, Ansible. Alerting: OpsGenie.
• 5+ years of experience in a Site Reliability Engineering role, with a proven track record of maintaining high-availability infrastructure in a high-load environment. • Expertise in Linux systems and web stacks (Nginx, PHP, MySQL/MariaDB, Redis/KeyDB) to ensure smooth and efficient operation. • Strong experience with MySQL/MariaDB Galera cluster and Gluster storage to optimize data reliability and scalability. • Deep knowledge of network architectures, including TCP/IP, DNS, VPNs, and load-balancing techniques, with hands-on experience in troubleshooting and optimizing network performance. • Proficiency in PHP and Docker for seamless integration and deployment of services. • Solid understanding of CI/CD and security best practices. • Understanding of Infrastructure-as-Code, Monitoring-as-Code, and GitOps (we use Ansible and Terraform). • Experience with Cloudflare and AWS services (EKS, S3, OpenSearch). • Experience building fault-tolerant systems and compliance audits (SOC, FFIEC, etc.). • Familiarity with Jira and Agile software development. • Familiarity with modern container orchestration and deployment tools (Kubernetes, Helm). • Fluent in Russian (reading, writing and speaking).
• Ongoing training and development • Opportunity for career advancement • Medical • Dental • Vision • Company Paid Group Life / AD&D Insurance • 7 Paid Holidays and 2 Floating Holiday Days to use at will • Paid Time Off • Flexible Spending/HSA • Employee Assistance Program (EAP) • 401(k) match • Referral Program
Apply NowYesterday
501 - 1000
Join ScienceLogic as a Project Manager to oversee customer platforms and team coordination.
🇺🇸 United States – Remote
💰 $21.2M Venture Round on 2022-10
⏰ Full Time
🟡 Mid-level
🟠 Senior
🗽 H1B Visa Sponsor
2 days ago
51 - 200
Join Raft to provide DevSecOps support for customers with digital solutions using Kubernetes and Docker.
2 days ago
51 - 200
Manage a team for building and operating reliability tools at Zillow. Drive reliability and scalability across engineering ecosystem.
6 days ago
1001 - 5000
Join Marigold as a Senior Database Reliability Engineer to oversee database operations and enhance performance efficiencies.
6 days ago
Join JLL as a Reliability Engineer, providing engineering support and implementing asset management plans. Leverage engineering methods to enhance operations while ensuring system reliability.