AKASA

Website LinkedIn All Job Openings

AI • Machine Learning • Revenue Cycle Management • Hospital Operations • Healthcare

51 - 200

Senior Site Reliability Engineer

October 31

🇺🇸 United States – Remote

💵 $145k - $200k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗽 H1B Visa Sponsor

AWS

Azure

Cloud

Docker

Grafana

Java

Kubernetes

Prometheus

Python

Terraform

Unix

Apply Now

AKASA

Website LinkedIn All Job Openings

AI • Machine Learning • Revenue Cycle Management • Hospital Operations • Healthcare

51 - 200

Description

• In this role, you will work closely with both Infrastructure and Platform team members to integrate best practice monitoring into our applications. • Your focus will be on developing high-quality runbooks for incident management, ensuring that our response procedures are efficient and effective. • You will be responsible for building high-quality visualizations and meaningful alerting systems that provide clear, actionable insights into system performance and health. • As an SRE, you will manage and optimize our infrastructure using tools like Terraform, GitHub CI/CD, and Kubernetes. • You will respond to incidents, troubleshoot production issues across the entire stack, and implement automation to streamline operational processes. • Your role will involve designing and maintaining core infrastructure to support our users, ensuring our SaaS products run smoothly and efficiently. • Additionally, you will be proactive in identifying potential issues before they become outages, leveraging your expertise in telemetry data collection, querying, and monitoring using tools such as Grafana, Prometheus/Mimir, OpenSearch, and Sentry. • You will collaborate with development teams to embed reliability and best practices into the software development lifecycle, ensuring robust and resilient applications. • Your contributions will be vital in scaling our monitoring infrastructure, enhancing system reliability, and ensuring seamless user experiences. • By continuously improving our infrastructure and processes, you will help AKASA deliver high-quality, dependable services to our customers.

Requirements

• Proficient in visualizing, monitoring, and alerting on telemetry data (logs, metrics, & traces) using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, and similar technologies. • Experience with Docker, Kubernetes, Terraform, or similar technologies. • 5+ years of professional experience using Python, Go, Java, or similar • Proficient with Linux and Unix Shell • Excellent collaboration and asynchronous communication skills. • Committed to thorough documentation to streamline learning and processes. • Proactive and enthusiastic attitude towards identifying and fixing issues. • Ability to deliver quickly, iterate fast, and adapt to changing requirements. • Proficient in using Git/GitHub for version control.

Benefits

• Unlimited paid time off (PTO) • Expansive coverage for health, dental, and vision • Employer contribution to Health Savings Accounts (HSA) • Generous parental leave policy • Full employee coverage for life insurance • Company-paid holidays • 401(K) plan

Apply Now