AI Engineer, Quality – Evals

11 - 50 employees

🤖 Artificial Intelligence

💸 Finance

☁️ SaaS

Artificial Intelligence • Finance • SaaS

Fieldguide is a modern, award-winning AI platform designed for advisory and audit firms. It helps streamline engagements, increase efficiency, and improve client satisfaction through its end-to-end engagement analytics. The platform supports various services, including risk advisory, cybersecurity and privacy, regulatory compliance, SOC readiness and audits, IT audits, and financial audits. Fieldguide integrates with popular productivity and IT tools to provide a seamless user experience, allowing for automated management of requests, documents, and reports. Trusted by top industry firms, Fieldguide enhances the practice of audit and advisory services with AI-driven innovations that save time and improve margins.

AI Engineer, Quality – Evals

🕒 April 21

🏄 California – Remote

💵 $170k - $220k / year

⏰ Full Time

🟢 Junior

🤖 AI Engineer

🦅 H1B Visa Sponsor

Postgres

Python

React

TypeScript

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Fieldguide

11 - 50 employees

🤖 Artificial Intelligence

💸 Finance

☁️ SaaS

Artificial Intelligence • Finance • SaaS

📋 Description

• Design and build a unified evaluation platform that serves as the single source of truth for all of our agentic systems and audit workflows • Build observability systems that surface agent behavior, trace execution, and failure modes in production, and feedback loops that turn production failures into first-class evaluation cases • Own the evaluation infrastructure stack including integration with LangSmith and LangGraph. • Translate customer problems into concrete agent behaviors and workflows • Integrate and orchestrate LLMs, tools, retrieval systems, and logic into cohesive, reliable agent experiences • Build automated pipelines that evaluate new models against all critical workflows within hours of release • Design evaluation harnesses for our most complex Agentic systems and workflows • Implement comparison frameworks that measure effectiveness, consistency, latency, and cost across model versions • Design guardrails and monitoring systems that catch quality regressions before they reach customers • Use AI as core leverage in how you design, build, test, and iterate • Prototype quickly to resolve uncertainty, then harden systems for enterprise-grade reliability • Build evaluations, feedback mechanisms, and guardrails so agents improve over time • Work with SMEs and ML Engineers to create evaluation datasets by curating production traces. • Design prompts, retrieval pipelines, and agent orchestration systems that perform reliably at scale • Define and document evaluation standards, best practices, and processes for the engineering organization • Advocate for evaluation-driven development and make it easy for the team to write and run evals • Partner with product and ML engineers to integrate evaluation requirements into agent development from day one • Take full ownership of large product areas rather than executing on narrow tasks

🎯 Requirements

• Multiple years of experience shipping production software in complex, real-world systems • Experience with TypeScript, React, Python, and Postgres • Built and deployed LLM-powered features serving production traffic • Implemented evaluation frameworks for model outputs and agent behaviors • Designed observability or tracing infrastructure for AI/ML systems • Worked with vector databases, embedding models, and RAG architectures • Experience with evaluation platforms (LangSmith, Langfuse, or similar) • Comfort operating in ambiguity and taking responsibility for outcomes • Deep empathy for professional-grade, mission-critical software (experience with audit and accounting workflows are not required)

🏖️ Benefits

• Competitive compensation packages with meaningful ownership • Flexible PTO • 401k • Wellness benefits, including a bundle of free therapy sessions • Technology & Work from Home reimbursement • Flexible work schedules

Apply Now

Similar Jobs

AI Platform Administrator

🕒 April 9

iTech AG

51 - 200

🔒 Cybersecurity

AI Platform Administrator at iTech managing an internal AI tooling platform for ServiceNow solutions. Ensuring reliable and effective operations while supporting teams and building demos.

🇺🇸 United States – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

🤖 AI Engineer

ServiceNow

AI Engineer – Responsible AI

🕒 March 26

Thermo Fisher Scientific

10,000+ employees

⚕️ Healthcare Insurance

🧬 Biotechnology

💊 Pharmaceuticals

AI Engineer developing secure and scalable AI solutions at Centific. Focus on safety, LLM, and production infrastructure for enterprise clients.

🇺🇸 United States – Remote

💵 $150k - $160k / year

⏰ Full Time

🟢 Junior

🟡 Mid-level

🤖 AI Engineer

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

Docker

ETL

Google Cloud Platform

Kubernetes

Microservices

Python

PyTorch

AI Engineer

🕒 February 21

OneMagnify

501 - 1000

🤝 B2B

🛍️ eCommerce

🤖 Artificial Intelligence

AI Engineer working with AI initiatives and strategic clients to leverage data insights for project success. Collaborating with teams to deliver solutions and drive results in artificial intelligence.

🇺🇸 United States – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

🤖 AI Engineer

🦅 H1B Visa Sponsor

Python

SQL

Tableau

AI Engineer

🕒 February 20

OneMagnify

501 - 1000

🤝 B2B

🛍️ eCommerce

🤖 Artificial Intelligence

AI Engineer leading projects for strategic clients within data analytics and AI applications. Collaborating with various technical teams to enhance AI solutions and ensure client satisfaction.

🇺🇸 United States – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

🤖 AI Engineer

🦅 H1B Visa Sponsor

Python

SQL

Tableau