Principal Engineer - ML Training Platform

November 13

Apply Now

Description

β€’ Design, build and maintain scalable ML data processing, model training solutions in the AWS cloud infrastructure environment utilizing Kubernetes β€’ Perform training and model performance optimization with various GPUs to improve model training speed and efficiency. β€’ Leverage Pytorch and Ray deep learning frameworks to operate highly-available systems at scale β€’ Drive the execution of technical programs and ensure milestone delivery β€’ Actively manage and mitigate technical risks

Requirements

β€’ 6+ years of Python software development experience. β€’ Hands-on experience with popular ML frameworks (PyTorch or TensorFlow) β€’ Hands-on experience with scaling ML systems β€’ Practical experience with large scale AWS cloud infrastructure utilizing Kubernetes. β€’ Strong problem solving skills and ability to evaluate challenges with an objective, data-driven approach β€’ Excellent programming and software design skills, including debugging, performance analysis, and test design β€’ Proven track record of operating highly-available systems at scale β€’ Strong collaboration and mentorship skills

Benefits

β€’ medical β€’ dental β€’ vision β€’ 401k with a company match β€’ health saving accounts β€’ life insurance β€’ pet insurance β€’ and more.

Apply Now

Similar Jobs

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com