Seamless.AI

Website LinkedIn All Job Openings

Artificial Intelligence • Machine Learning • Natural Language Processing • Neuro-Linguistic Programming • Data Science

201 - 500

Principal Data Engineer

September 28

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

🚰 Data Engineer

AWS

ETL

Numpy

Pandas

PySpark

Python

Spark

SQL

Apply Now

Seamless.AI

Website LinkedIn All Job Openings

Artificial Intelligence • Machine Learning • Natural Language Processing • Neuro-Linguistic Programming • Data Science

201 - 500

Description

• Design, develop, and maintain robust and scalable ETL pipelines to acquire, transform, and load data from various sources into our data ecosystem. • Collaborate with cross-functional teams to understand data requirements and develop efficient data acquisition and integration strategies. • Implement data transformation logic using Python and other relevant programming languages and frameworks. • Utilize AWS Glue or similar tools to create and manage ETL jobs, workflows, and data catalogs. • Optimize and tune ETL processes for improved performance and scalability, particularly with large data sets. • Apply methodologies and techniques for data matching, deduplication, and aggregation to ensure data accuracy and quality. • Implement and maintain data governance practices to ensure compliance, data security, and privacy. • Collaborate with the data engineering team to explore and adopt new technologies and tools that enhance the efficiency and effectiveness of data processing.

Requirements

• 7+ years of experience as a Data Engineer, with a focus on ETL processes and data integration. • Professional experience with Spark and AWS pipeline development required. • Bachelor's degree in Computer Science, Information Systems, related fields or equivalent years of work experience.

Apply Now