Join our Facebook group

👉 Remote Jobs Network

Senior DevOps Engineer - Kubernetes, MLOps, LLMOps

August 27

Apply Now
Logo of LEO Technologies, LLC

LEO Technologies, LLC

We help you hear the voices that matter.

Law Enforcement Technology • Corrections Technology • Inmate Health Technology • Inmate Communications • Corrections

11 - 50

Description

• The Senior DevOps Engineer will play a critical role in ensuring that our systems are highly available, reliable, and scalable. • You will architect, build, and monitor cloud-native architectures with Kubernetes and related technologies, particularly in the context of machine learning and AI workloads. • You should have a deep understanding of the Software Development Life Cycle, including Continuous Integration and Continuous Deployment (CI/CD) pipeline architecture, particularly as it relates to deploying ML models and AI services in Kubernetes environments. • You will assist in the design and operation of critical cloud infrastructure on AWS, with a focus on supporting the unique requirements of machine learning and AI-driven applications. • Collaborate closely with data scientists and ML engineers to create a streamlined, automated build and deployment process for ML models and LLMs in Kubernetes. • Implement and manage the infrastructure necessary for the continuous integration, delivery, and monitoring of ML models and AI services, ensuring they are seamlessly integrated into our SaaS applications. • Ensure the availability and performance of production systems that run ML-driven services, proactively identifying and resolving issues that may impact model performance or availability. • Optimize infrastructure for the efficient training, deployment, and scaling of ML models and LLMs, leveraging Kubernetes, GPU clusters, and cloud-native tools, including AWS SageMaker. • Develop and maintain monitoring and alerting solutions tailored to ML and AI workloads, ensuring that both the infrastructure and deployed models are performing as expected. • Troubleshoot and resolve production incidents ensuring minimal downtime and quick recovery. • Participate in on-call rotation as necessary. • Ensure the security and compliance of our production systems and data, with a particular focus on protecting sensitive AI and ML data. • Mentor and coach junior DevOps engineers.

Requirements

• Bachelor's degree in Computer Science, Engineering, or a related field. • A minimum of 7 years of experience in maintaining optimal performance of online production environments, utilizing bare metal, cloud, and container technologies. • At least 4 years of experience managing production Kubernetes infrastructure, with exposure to cloud vendor Kubernetes solutions such as EKS, AKS, and GKE. • Strong experience with Docker for containerization, including creating and managing Docker images and containers. • Strong experience in architecting and managing SaaS applications in Kubernetes, with specific experience in MLOps and LLMOps. • Deep understanding of the machine learning lifecycle, including model training, deployment, monitoring, and scaling, particularly using AWS SageMaker. • Experience with MLOps tools and frameworks, such as Kubeflow, MLflow or similar, and their integration into Kubernetes environments. • Familiarity with LLMOps, including the deployment and management of LLMs in production environments. • Solid experience in scripting languages such as Python. • Experience with Infrastructure deployment and automation tools such as Terraform, CloudFormation, etc. • Working knowledge of industry-standard build tooling and CI/CD using GitHub & GitHub Actions. • Expertise in monitoring and logging solutions such as Prometheus and Grafana. • Good understanding of networking and security concepts. • Strong knowledge of Linux systems and shell scripting. • Strong communication and collaboration skills, with experience working closely with data scientists and ML engineers. • Experience working in an agile environment and understanding of agile methodologies. • Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer) are a plus

Benefits

• 3 weeks of paid vacation – out the gate!! • Competitive Salary. • Generous medical, dental, and vision plans. • Sick, and paid holidays are offered. • Stand/ sit workstations at our amenity rich office in Irvine, CA. • Casual environment. • Kitchen stocked with snacks and drinks on site. • Hybrid and Remote options available

Apply Now

Similar Jobs

August 27

Element

11 - 50

Streamline software development for federal clients using DevOps practices in a consultative role.

August 27

Dynamo AI

11 - 50

Ensure efficient operation of production environments while managing CI/CD and cloud infrastructure.

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com