Open Source AI Engineer - Evals

Arize AI is a company that provides a comprehensive AI observability and evaluation platform designed for AI engineers to build and manage AI applications, particularly those involving large language models (LLMs). The platform offers end-to-end AI tracing, monitoring, and evaluation tools to help users identify and resolve performance bottlenecks, ensure data quality, and improve the performance of AI systems. Arize also supports experimentation and visualization, enabling users to iterate rapidly on AI applications and maintain high-quality service delivery. The company caters to top AI companies, offering solutions for both online and offline LLM evaluation and observability. Arize's tools are particularly valuable for AI model monitoring, troubleshooting, and performance optimization, helping businesses mitigate risks and improve AI outcomes.

51 - 200 employees

Founded 2019

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Open Source AI Engineer - Evals

February 20

🇺🇸 United States – Remote

💵 $150k - $185k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 AI Engineer

🦅 H1B Visa Sponsor

Open Source

Python

TypeScript

Apply Now

Arize AI

Search More AI Engineer Jobs

51 - 200 employees

Founded 2019

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

📋 Description

• Build LLM Eval Frameworks: Design, architect, and open-source new libraries, pipelines, and APIs that make it simpler to evaluate LLM output quality, consistency, and reliability at scale. • Define Metrics and Benchmarks: Curate golden datasets and develop robust benchmarked metrics that guide data scientists and AI practitioners in optimizing their AI tasks. • Collaborate with the Community: Partner closely with the broader AI open source ecosystem, gather feedback, review pull requests, and steer the direction of the project to address real developer needs. • Prototype and Iterate Rapidly: Experiment with state-of-the-art LLM techniques, turning research into practical developer tooling. • Improve Observability and Debugging: Integrate with our existing platform to surface deeper insights on LLM behavior—help teams quickly diagnose and fix issues such as hallucinations or bias. • Educate and Evangelize: Write blog posts, white papers, tutorials, and documentation to help developers succeed with our open source tools and grow the LLM eval community.

🎯 Requirements

• Hands-on LLM Experience: Familiarity with popular LLM frameworks, prompt engineering techniques, and model fine-tuning. • Strong Programming Skills: Fluent in Python for AI workflows; bonus if you can navigate TypeScript as well. • Evaluation Knowledge: Understanding of core NLP evaluation methods and experience applying or extending them for LLM systems. • Open Source Track Record: Contributions to open source projects, personal GitHub repos with interesting AI demos, or a history of active engagement in developer communities. • ML Observability & Tools: Familiarity with debugging AI applications, exploring embeddings, or building data-heavy dashboards is a plus.

🏖️ Benefits

• medical • dental • vision • 401(k) plan • unlimited paid time off • generous parental leave plan • others for mental and wellness support

Apply Now

Similar Jobs

AI Engineer

February 18

Ensemble Health Partners

10,000+ employees

⚕️ Healthcare Insurance

☁️ SaaS

🏢 Enterprise

Ensemble Health Partners seeks a Lead Engineer, AI to implement AI models and build pipelines. Focus on generative AI, LLMs, and predictive modeling.

🇺🇸 United States – Remote

💰 Private Equity Round on 2022-03

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 AI Engineer

AWS

Azure

Cloud

Google Cloud Platform

Hadoop

PyTorch

Scikit-Learn

Spark

SQL

Tensorflow

AI Developer

February 18

Iris

501 - 1000

Develop a multi-agent LLM system as part of IRIS, an AI-powered operating system for creators.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 AI Engineer

🦅 H1B Visa Sponsor

Elixir

Python

AI Engineer

February 16

Zed

1001 - 5000

Join the Zed team to enhance a code editor with AI capabilities and innovate interactions with language models. Optimizes prototyping and evaluation of AI features.

🇺🇸 United States – Remote

💰 $15M Funding Round on 2002-08

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 AI Engineer

Python

Rust

Applied AI Engineer

February 15

AIRINC

51 - 200

Join Air as an Applied AI Engineer to drive intelligent systems for creative teams.

🇺🇸 United States – Remote

💵 $160k - $264k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 AI Engineer

Airflow

AWS

Azure

Cloud

Distributed Systems

Docker

ElasticSearch

Electron

Google Cloud Platform

Kafka

Kubernetes

Node.js

Python

PyTorch

React

Spark

Tensorflow

TypeScript

AI Engineer

February 15

SFR3 Fund

2 - 10

🏠 Real Estate

Join an AI-driven real estate fund focused on managing thousands of homes worth billions. Use cutting-edge technology to improve operations and innovate continuously.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 AI Engineer

Apache

AWS

Cloud

GraphQL

Kafka

Python

React