Site Reliability Engineer - LLM and Machine Learning

December 20, 2023

Apply Now
Logo of techruiter.

techruiter.

Tech Recruitment • Product Recruitment • Science Recruitment • Consulting • Talent Acquisition

11 - 50

Description

• Collaborate with engineering and research teams to design, implement, and automate infrastructure for LLM and Machine Learning workloads, ensuring scalability and reliability. • Manage deployment pipelines, configuration management, and orchestration tools to streamline the deployment of models and services. • Implement and maintain robust monitoring, alerting, and logging systems to proactively identify and resolve issues. Ensure optimal system performance. • Lead incident response efforts, investigate root causes of outages, and implement preventive measures to reduce the likelihood of recurrence. • Perform capacity planning and scaling to accommodate growing workloads and ensure resource efficiency. • Collaborate with security teams to implement security best practices, vulnerability assessments, and compliance requirements for LLM and Machine Learning systems. • Continuously evaluate and improve system reliability, performance, and efficiency through automation and optimization. • Maintain comprehensive documentation for infrastructure configurations, procedures, and incident reports.

Requirements

• Bachelor's or Master's degree in Computer Science, Information Technology, or a related field. • Proven experience as a Site Reliability Engineer or a related role with a focus on LLM and Machine Learning infrastructure. • Strong proficiency in cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes). • Experience with configuration management tools (e.g., Ansible, Terraform) and CI/CD pipelines. • Knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack). • Scripting and automation skills (e.g., Python, Bash). • Excellent problem-solving and troubleshooting skills. • Strong communication and collaboration skills.

Benefits

• Excellent salary and benefits package • Opportunity to work with cutting-edge technology • Collaborative and innovative work environment

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com