About Candidate
Focusing to learn new technologies in lively way. I design and manage scalable data pipelines, build robust data models, and optimize ETL processes and workflows. I specialize in developing Spark jobs for efficient data workflows and analytics reporting.
Skills:
• Apache Spark development(Pyspark) – Expert level
• Databricks
• Shell/Bash scripting – Basic
• Python
• Cloud (AWS)
• Cloud (Azure)
• SQL(PostgreSQL, AzureSQL, MongoDB)
• Git
Experiences
- Data Warehousing: Designed and maintained the Medallion Data Layers within Databricks. - ETL Design & Development: Developed and managed over 10+ ETL pipelines to extract data from 5+ sources, ingest it into the Lakehouse, and process it using Pandas and PySpark. - Optimization: Enhanced Python code performance, reducing runtime by 75%. - Reporting Automation: Automated 15 reporting sheets using Pandas, formatting them with XlsxWriter, reducing manual efforts by 99%. - Automated Report Distribution: Implemented cron jobs to automate report generation and delivery via email to stakeholders on a configurable schedule.



