Catalytic Data Science

Website LinkedIn All Job Openings

SaaS Enterprise Solution • Integrated Digital Workspace • Better R&D, Faster • Big Data Aggregator • Digital Research Platform

11 - 50 employees

🧬 Biotechnology

💊 Pharmaceuticals

🔬 Science

Data Engineer

October 20

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🚰 Data Engineer

Airflow

AWS

Cloud

ElasticSearch

ETL

Java

Microservices

MySQL

Python

Apply Now

Catalytic Data Science

Website LinkedIn All Job Openings

SaaS Enterprise Solution • Integrated Digital Workspace • Better R&D, Faster • Big Data Aggregator • Digital Research Platform

11 - 50 employees

🧬 Biotechnology

💊 Pharmaceuticals

🔬 Science

Description

• Build, test, and operate automated Extract, Transform, and Load (ETL) pipelines that process terabytes of text data nightly. • Develop service frontends around our various backend data stores (AWS Aurora, MySQL, Elasticsearch, S3). • Rapidly prototype, test, and deploy data pipelines for LLMs using AWS. • Collaborate with data scientists and NLP engineers to understand data requirements for LLMs. • Optimize performance, reliability, and scalability of data pipelines and LLMs by applying best practices. • Ensure quality, integrity, and security of the data by implementing data validation and governance policies.

Requirements

• Bachelor's degree or higher in computer science, engineering, or a related field. • 3+ years of experience in data engineering, preferably with large-scale text data and LLMs and 6+ years of any software engineering experience (including data engineering). • Proficient in Python 3 or Java, preferably both. • Experience with data modeling, ETL, and data warehouse design and implementation. • Expertise with ETL schedulers such as Airflow, Prefect or similar frameworks. • Familiar with LLMs and NLP concepts and frameworks such as Transformers, BERT, GPT, PaLM, and LLaMA. • Day-to-day experience using AWS technologies such as Lambda, ECS Fargate, SQS, & SNS. • Experience extracting, processing, storing, and querying of petabyte-scale datasets. • Familiarity with building and using containers. • Familiarity with event-based microservices. • Strong communication, collaboration, and problem-solving skills.

Apply Now