Unison Consulting Pte Ltd
Data Engineer with Python /Pyspark
Job Description
- Develop and implement data pipelines for ingesting and collecting data from various sources into a centralized data platform.
- Develop and maintain ETL jobs using AWS Glue services to process and transform data at scale.
- Optimize and troubleshoot AWS Glue jobs for performance and reliability.
- Utilize Python and PySpark to efficiently handle large volumes of data during the ingestion process.
- Design and implement scalable data processing solutions using PySpark to transform raw data into a structured and usable format.
- Apply data cleansing, enrichment, and validation techniques to ensure data quality and accuracy.
- Create and maintain ETL processes using Python and PySpark to move and transform data between different systems.
- Optimize ETL workflows for performance and efficiency.
- Collaborate with data architects to design and implement data models that support business requirements.
- Ensure data structures are optimized for analytics and reporting.
- Work with distributed computing frameworks, such as Apache Spark, to process and analyze large-scale datasets.
- Manage and optimize databases, both SQL and NoSQL, to support data storage and retrieval needs.
- Implement indexing, partitioning, and other database optimization techniques.
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver effective solutions.
- Work closely with software engineers to integrate data solutions into larger applications.
- Implement monitoring solutions to track data pipeline performance and proactively identify and address issues
- Ensure compliance with data privacy regulations and company policies.
- Stay abreast of industry trends and advancements in data engineering, Python, and PySpark.
Requirements
- Proficiency in Python and PySpark.
- Strong knowledge of data engineering concepts and best practices.
- Hands-on experience with AWS Glue and other AWS services.
- Experience with big data technologies and distributed computing.
- Familiarity with database management systems (SQL and NoSQL).
- Understanding of ETL processes and data modeling.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Bachelor’s degree in Computer Science, Information Technology, or a related field.