Lead Data Engineer - Python, PySpark & SQL
P
Princeton IT Services, Inc
📍 Canada, Canada, Canada
Job Description
Job Title
Lead Data Engineer – Python, PySpark & SQL
Location
Canada
Job Type
Full time contract
Responsibilities
- Build scalable data ingestion and transformation pipelines using Python, PySpark, and SQL.
- Process raw CSV/text files from AWS S3, including validating headers, schema checks, and malformed file detection.
- Convert raw data into structured DataFrames and implement reusable data quality checks.
- Develop advanced transformations using SQL/PySpark (Window functions, LAG(), grouping logic, date gap detection, etc.).
- Deploy and tune PySpark applications on AWS EMR, optimizing executor memory, cores, shuffle behavior, and cluster performance.
- Work with AWS services such as S3, EMR, Glue, Lambda, IAM.
- Debug performance issues (OOM errors, shuffle spill, GC problems) and improve pipeline reliability.
- Lead design discussions, code reviews, and mentor junio...