5 days ago
• Design, build, and maintain scalable data pipelines using PySpark and Databricks • Optimize data processing and storage for maximum performance and efficiency • Troubleshoot and debug data-related issues, and implement solutions to prevent reoccurrence • Collaborate with data scientists, software engineers, and other stakeholders to ensure that data solutions are aligned with business goals
• Strong experience in Python programming and PySpark, and SparkSQL • Clear understanding of Spark Data structures, RDD, Dataframe, dataset • Expertise in Databricks and ADLS • Expertise handling data type, from dictionaries, lists, tuples, sets, arrays, pandas dataframes, and spark dataframes • Expertise working with complex data types such as, structs, and JSON strings. • Clear understanding of Spark Broadcast, Repartition, Bloom index filters • Experience with ADLS optimization, partitioning, shuffling and shrinking • Ideal experience with disk caching • Ideal experience with cost based optimizer • Experience with data modeling, data warehousing, data-lake, delta-lake and ETL/ELT processes in ADF • Strong analytical and problem-solving skills • Excellent documentation, communication and collaboration skills
Apply NowOctober 1
2 - 10
Build and optimize ETL pipelines for data insights in a growing team.
September 27
501 - 1000
Support Azure Data Platform solutions at 3Cloud's Managed Services team.
September 23
51 - 200
Data Engineer at Sharesource, creating efficient data pipelines for clients.