Open Source AI Engineer - Evals

February 20

Apply Now
Logo of Arize AI

Arize AI

Arize AI is a company that provides a comprehensive AI observability and evaluation platform designed for AI engineers to build and manage AI applications, particularly those involving large language models (LLMs). The platform offers end-to-end AI tracing, monitoring, and evaluation tools to help users identify and resolve performance bottlenecks, ensure data quality, and improve the performance of AI systems. Arize also supports experimentation and visualization, enabling users to iterate rapidly on AI applications and maintain high-quality service delivery. The company caters to top AI companies, offering solutions for both online and offline LLM evaluation and observability. Arize's tools are particularly valuable for AI model monitoring, troubleshooting, and performance optimization, helping businesses mitigate risks and improve AI outcomes.

📋 Description

• Build LLM Eval Frameworks: Design, architect, and open-source new libraries, pipelines, and APIs that make it simpler to evaluate LLM output quality, consistency, and reliability at scale. • Define Metrics and Benchmarks: Curate golden datasets and develop robust benchmarked metrics that guide data scientists and AI practitioners in optimizing their AI tasks. • Collaborate with the Community: Partner closely with the broader AI open source ecosystem, gather feedback, review pull requests, and steer the direction of the project to address real developer needs. • Prototype and Iterate Rapidly: Experiment with state-of-the-art LLM techniques, turning research into practical developer tooling. • Improve Observability and Debugging: Integrate with our existing platform to surface deeper insights on LLM behavior—help teams quickly diagnose and fix issues such as hallucinations or bias. • Educate and Evangelize: Write blog posts, white papers, tutorials, and documentation to help developers succeed with our open source tools and grow the LLM eval community.

🎯 Requirements

• Hands-on LLM Experience: Familiarity with popular LLM frameworks, prompt engineering techniques, and model fine-tuning. • Strong Programming Skills: Fluent in Python for AI workflows; bonus if you can navigate TypeScript as well. • Evaluation Knowledge: Understanding of core NLP evaluation methods and experience applying or extending them for LLM systems. • Open Source Track Record: Contributions to open source projects, personal GitHub repos with interesting AI demos, or a history of active engagement in developer communities. • ML Observability & Tools: Familiarity with debugging AI applications, exploring embeddings, or building data-heavy dashboards is a plus.

🏖️ Benefits

• medical • dental • vision • 401(k) plan • unlimited paid time off • generous parental leave plan • others for mental and wellness support

Apply Now

February 18

Ensemble Health Partners seeks a Lead Engineer, AI to implement AI models and build pipelines. Focus on generative AI, LLMs, and predictive modeling.

February 18

Iris

501 - 1000

Develop a multi-agent LLM system as part of IRIS, an AI-powered operating system for creators.

February 16

Zed

1001 - 5000

Join the Zed team to enhance a code editor with AI capabilities and innovate interactions with language models. Optimizes prototyping and evaluation of AI features.

February 15

Replicate

11 - 50

Join a team at Replicate to develop and test cutting-edge generative AI models. Service public model library with reliable and user-friendly options.

Discover 100,000+ Remote Jobs!

Join now to unlock all job opportunities.

Find your dream remote job

Discover hidden jobs

We scan the internet everyday and find jobs not posted on LinkedIn or other job boards.

Head start against the competition

We find jobs within 24 hours of being posted, so you can apply before everyone else.

Be the first to know

Daily emails with new job openings straight to your inbox.

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers

Wall of Love

Frequently asked questions

We use powerful scraping tech to scan the internet for thousands of remote jobs daily. It operates 24/7 and costs us to operate, so we charge for access to keep the site running.

Of course! You can cancel your subscription at any time with no hidden fees or penalties. Once canceled, you’ll still have access until the end of your current billing period.

Other job boards only have jobs from companies that pay to post. This means that you miss out on jobs from companies that don't want to pay. On the other hand, Remote Rocketship scrapes the internet for jobs and doesn't accept payments from companies. This means we have thousands more jobs!

New jobs are constantly being posted. We check each company website every day to ensure we have the most up-to-date job listings.

Yes! We’re always looking to expand our listings and appreciate any suggestions from our community. Just send an email to Lior@remoterocketship.com. I read every request.

Remote Rocketship is a solo project by me, Lior Neu-ner. I built this website for my wife when she was looking for a job! She was having a hard time finding remote jobs, so I decided to build her a tool that would search the internet for her.

Why I created Remote Rocketship

Choose your membership

Cancel anytime

Loved by 10,000+ remote workers
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com