October 31
🇺🇸 United States – Remote
💵 $145k - $200k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
• In this role, you will work closely with both Infrastructure and Platform team members to integrate best practice monitoring into our applications. • Your focus will be on developing high-quality runbooks for incident management, ensuring that our response procedures are efficient and effective. • You will be responsible for building high-quality visualizations and meaningful alerting systems that provide clear, actionable insights into system performance and health. • As an SRE, you will manage and optimize our infrastructure using tools like Terraform, GitHub CI/CD, and Kubernetes. • You will respond to incidents, troubleshoot production issues across the entire stack, and implement automation to streamline operational processes. • Your role will involve designing and maintaining core infrastructure to support our users, ensuring our SaaS products run smoothly and efficiently. • Additionally, you will be proactive in identifying potential issues before they become outages, leveraging your expertise in telemetry data collection, querying, and monitoring using tools such as Grafana, Prometheus/Mimir, OpenSearch, and Sentry. • You will collaborate with development teams to embed reliability and best practices into the software development lifecycle, ensuring robust and resilient applications. • Your contributions will be vital in scaling our monitoring infrastructure, enhancing system reliability, and ensuring seamless user experiences. • By continuously improving our infrastructure and processes, you will help AKASA deliver high-quality, dependable services to our customers.
• Proficient in visualizing, monitoring, and alerting on telemetry data (logs, metrics, & traces) using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, and similar technologies. • Experience with Docker, Kubernetes, Terraform, or similar technologies. • 5+ years of professional experience using Python, Go, Java, or similar • Proficient with Linux and Unix Shell • Excellent collaboration and asynchronous communication skills. • Committed to thorough documentation to streamline learning and processes. • Proactive and enthusiastic attitude towards identifying and fixing issues. • Ability to deliver quickly, iterate fast, and adapt to changing requirements. • Proficient in using Git/GitHub for version control.
• Unlimited paid time off (PTO) • Expansive coverage for health, dental, and vision • Employer contribution to Health Savings Accounts (HSA) • Generous parental leave policy • Full employee coverage for life insurance • Company-paid holidays • 401(K) plan
Apply NowOctober 31
51 - 200
Skilled Site Reliability Engineer to scale infrastructure at Replicant.
🇺🇸 United States – Remote
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
October 31
11 - 50
Ensure reliability of high-frequency cryptocurrency trading systems as a Site Reliability Engineer.
🇺🇸 United States – Remote
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🗽 H1B Visa Sponsor
October 31
201 - 500
Senior DevOps Engineer deploying applications on AWS infrastructure for a cybersecurity company.