November 10
• Design, build, and maintain scalable data pipelines using PySpark and Databricks • Optimize data processing and storage for maximum performance and efficiency • Troubleshoot and debug data-related issues, and implement solutions to prevent reoccurrence • Collaborate with data scientists, software engineers, and other stakeholders to ensure that data solutions are aligned with business goals
• Strong experience in Python programming and PySpark, and SparkSQL • Clear understanding of Spark Data structures, RDD, Dataframe, dataset • Expertise in Databricks and ADLS • Expertise handling data type, from dictionaries, lists, tuples, sets, arrays, pandas dataframes, and spark dataframes • Expertise working with complex data types such as, structs, and JSON strings. • Clear understanding of Spark Broadcast, Repartition, Bloom index filters • Experience with ADLS optimization, partitioning, shuffling and shrinking • Ideal experience with disk caching • Ideal experience with cost based optimizer • Experience with data modeling, data warehousing, data-lake, delta-lake and ETL/ELT processes in ADF • Strong analytical and problem-solving skills • Excellent documentation, communication and collaboration skills
Apply NowSeptember 18
Work remotely in a role focused on Azure data solutions.