Lean Tech

Website LinkedIn All Job Openings

Development • Logistics • Automation • Software • Education

501 - 1000

Senior Python Data Engineer

2 days ago

🇧🇷 Brazil – Remote

⏰ Full Time

🟠 Senior

🔙 Backend Engineer

Apache

AWS

Distributed Systems

Docker

ETL

Kafka

Kubernetes

Python

Scala

Spark

SQL

Terraform

Apply Now

Lean Tech

Website LinkedIn All Job Openings

Development • Logistics • Automation • Software • Education

501 - 1000

Description

• Develop, optimize, and maintain data processes • Troubleshoot issues to ensure data accuracy and integrity • Assist in driving data initiatives by designing, implementing, and maintaining effective data solutions that align with user requirements and organizational goals • Design, build, and optimize scalable ETL pipelines using Apache Spark on Amazon EMR • Work closely with data scientists, analysts, and other engineering teams to define, implement, and maintain high-performance data infrastructure • Develop and maintain automated data workflows and processes for efficient data ingestion, transformation, and loading • Implement best practices for data engineering, including monitoring, logging, and alerting for data pipelines • Collaborate with stakeholders to understand business requirements and translate them into technical solutions • Optimize performance of data processing jobs and troubleshoot issues with large-scale distributed systems • Drive innovation in data infrastructure, evaluating and integrating new tools, frameworks, and approaches

Requirements

• 5+ years of experience in data engineering, with at least 3+ years working with Apache Spark and Amazon EMR • Strong programming skills in Python and Scala with a focus on performance tuning and optimization for Spark jobs • Proven experience working with SQL for data management, querying, and optimization is required • Deep understanding of distributed computing concepts, data partitioning, and resource management in large-scale data processing systems • Proficiency in building and maintaining ETL pipelines for structured and unstructured data • Hands-on experience with AWS services such as S3, Lambda, EMR, Glue, and RDS • Strong problem-solving skills and ability to debug complex systems • Preferred Qualifications: Experience with DevOps practices, including CI/CD, infrastructure as code (e.g., Terraform, CloudFormation), and containerization (e.g., Docker) • Experience with Kubernetes and container orchestration for Spark jobs • Familiarity with streaming data processing using tools like Kafka, Kinesis, or Flink • Experience with modern data lake architectures, including Delta Lake or Iceberg • AWS Certification (e.g., AWS Certified Big Data – Specialty, AWS Certified Solutions Architect) is a plus.