Data Engineer

October 20

Apply Now
Logo of Catalytic Data Science

Catalytic Data Science

SaaS Enterprise Solution β€’ Integrated Digital Workspace β€’ Better R&D, Faster β€’ Big Data Aggregator β€’ Digital Research Platform

Description

β€’ Build, test, and operate automated Extract, Transform, and Load (ETL) pipelines that process terabytes of text data nightly. β€’ Develop service frontends around our various backend data stores (AWS Aurora, MySQL, Elasticsearch, S3). β€’ Rapidly prototype, test, and deploy data pipelines for LLMs using AWS. β€’ Collaborate with data scientists and NLP engineers to understand data requirements for LLMs. β€’ Optimize performance, reliability, and scalability of data pipelines and LLMs by applying best practices. β€’ Ensure quality, integrity, and security of the data by implementing data validation and governance policies.

Requirements

β€’ Bachelor's degree or higher in computer science, engineering, or a related field. β€’ 3+ years of experience in data engineering, preferably with large-scale text data and LLMs and 6+ years of any software engineering experience (including data engineering). β€’ Proficient in Python 3 or Java, preferably both. β€’ Experience with data modeling, ETL, and data warehouse design and implementation. β€’ Expertise with ETL schedulers such as Airflow, Prefect or similar frameworks. β€’ Familiar with LLMs and NLP concepts and frameworks such as Transformers, BERT, GPT, PaLM, and LLaMA. β€’ Day-to-day experience using AWS technologies such as Lambda, ECS Fargate, SQS, & SNS. β€’ Experience extracting, processing, storing, and querying of petabyte-scale datasets. β€’ Familiarity with building and using containers. β€’ Familiarity with event-based microservices. β€’ Strong communication, collaboration, and problem-solving skills.

Apply Now

Similar Jobs

Built byΒ Lior Neu-ner. I'd love to hear your feedback β€” Get in touch via DM or lior@remoterocketship.com