LLM Data Engineer

Halo Media LLC is a company that specializes in solving complex problems by combining creative talent with subject matter expertise. They showcase their work through various case studies, indicating a focus on tailored solutions for their clients.

Web Design • Web Development • Internet • iPhone • Applications

201 - 500 employees

Founded 2006

📱 Media

🏢 Enterprise

LLM Data Engineer

November 5, 2024

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Apache

AWS

Azure

Cloud

Google Cloud Platform

Python

Spark

Apply Now

Halo Media LLC

Search More Data Engineer Jobs

Web Design • Web Development • Internet • iPhone • Applications

201 - 500 employees

Founded 2006

📱 Media

🏢 Enterprise

📋 Description

• We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. • The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques. • This role sits in the AI COE within DX Tech & Digital. • You will work on highly visible strategic projects, collaborating with cross-functional teams to define requirements and deliver high-quality AI solutions. • The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications. • Responsibilities: Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes. • Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform. • Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data. • Benchmark and implement various vector stores, embedding techniques, and retrieval methods. • Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search). • Implement and maintain auto-tagging systems and data preparation processes for LLMs. • Develop tools for text and image data crawling, cleaning, and refinement. • Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models. • Work with data lake house architectures to optimize data storage and processing. • Integrate and optimize workflows using Snowflake and various vector store technologies.

🎯 Requirements

• Master's degree in Computer Science, Data Science, or a related field • 3-5 years of work experience in data engineering, preferably in AI/ML contexts • Proficiency in Python, JSON, HTTP, and related tools • Strong understanding of LLM architectures, training processes, and data requirements • Experience with RAG systems, knowledge base construction, and vector databases • Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts • Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated) • Knowledge of data crawling techniques and associated ethical considerations • Strong problem-solving skills and ability to work in a fast-paced, innovative environment • Familiarity with Snowflake and its integration in AI/ML pipelines • Experience with various vector store technologies and their applications in AI • Understanding of data lakehouse concepts and architectures • Excellent communication, collaboration, and problem-solving skills. • Ability to translate business needs into technical solutions. • Passion for innovation and a commitment to ethical AI development. • Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions. • Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies.

🏖️ Benefits

• US employees benefit package.

Apply Now

Similar Jobs

Snowflake Data Engineering Consultant

November 4, 2024

Continuus Technologies

51 - 200

💸 Finance

🤖 Artificial Intelligence

Support Snowflake data warehouse performance and optimize queries at Continuus Technologies.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

AWS

Azure

Cloud

ETL

JavaScript

SQL

Data Engineer

November 3, 2024

IMCS Group

201 - 500

🎯 Recruitment

👥 HR Tech

🏢 Enterprise

Data Engineer role at IMCS Group, requiring expertise in Java and big data.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

🦅 H1B Visa Sponsor

Cassandra

Java

Kafka

MongoDB

NoSQL

SQL

Data Engineer

October 17, 2024

Shuvel

11 - 50

🌐 Web 3

🔌 API

☁️ SaaS

Develop data solutions using Google Cloud Platform and Python scripting.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Cloud

ETL

Google Cloud Platform

JavaScript

Python

React

SQL

Data Engineer

September 19, 2024

Wynd Labs

11 - 50

🤖 Artificial Intelligence

Data Engineer for Grass, building data pipelines and scalable infrastructure.

🇺🇸 United States – Remote

💵 $100k - $140k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Airflow

Amazon Redshift

Apache

AWS

Azure

Cloud

Docker

ETL

Google Cloud Platform

Java

Kubernetes

Node.js

Python

Scala

SQL

Terraform

Snowflake Data Engineer

September 17, 2024

BayApps, Inc.

11 - 50

🤝 B2B

☁️ SaaS

Join BayApps, Inc. as a Snowflake Data Engineer. Build efficient data storage solutions remotely.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Cloud

ETL

Node.js

SQL

Tableau