ML Ops Engineer

September 13

Apply Now

Description

• ML Infrastructure Management: Architect, manage, and maintain scalable ML infrastructure using AWS services like EC2, SageMaker, S3, and CloudFormation templates. • Model Deployment: Automate the deployment of machine learning models to production using AWS SageMaker and Databricks, ensuring continuous availability and performance. • CloudFormation Automation: Use AWS CloudFormation to define and provision infrastructure for ML workloads, ensuring infrastructure as code best practices. • Data Management & Governance: Leverage Databricks Unity Catalog for data governance, security, and compliance, ensuring high data quality and streamlined model training processes. • Monitoring & Optimization: Implement and monitor models in production using tools like AWS CloudWatch and Databricks monitoring solutions. Address performance bottlenecks and ensure model accuracy over time. • Collaboration with Data Teams: Work closely with data scientists and data engineers to streamline model development and production workflows, ensuring seamless collaboration. • Automated ML Pipelines: Build and maintain CI/CD pipelines for ML models using AWS and Databricks. Ensure models are consistently tested, monitored, and retrained when necessary.

Requirements

• AWS Expertise: Strong knowledge of AWS services, including AWS SageMaker, CloudFormation, EC2, S3, and CloudWatch. • CloudFormation: Experience with creating, managing, and automating AWS resources using AWS CloudFormation templates. • Databricks Experience: Expertise in the Databricks platform and Unity Catalog, with the ability to manage large-scale data pipelines and ensure model performance at scale. • ML Deployment: Proven experience in deploying machine learning models to production environments using AWS SageMaker and Databricks. • CI/CD Pipelines: Solid understanding of CI/CD pipelines and version control, particularly in the context of machine learning models. • Programming Skills: Strong coding skills in Python, with knowledge of ML libraries such as TensorFlow, PyTorch, and scikit-learn. • Monitoring & Logging: Experience setting up and managing monitoring and logging for production models, tracking performance and detecting anomalies. • DevOps Mindset: Familiarity with DevOps principles and infrastructure automation, with experience using Docker, Kubernetes, or other containerization/orchestration tools.

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com