Unison Consulting Pte Ltd

Data Engineer with Python /Pyspark

18 March 2024
Apply Now
Deadline date:
£125000 - £194000

Job Description

    • Develop and implement data pipelines for ingesting and collecting data from various sources into a centralized data platform.
    • Develop and maintain ETL jobs using AWS Glue services to process and transform data at scale.
    • Optimize and troubleshoot AWS Glue jobs for performance and reliability.
    • Utilize Python and PySpark to efficiently handle large volumes of data during the ingestion process.
    • Design and implement scalable data processing solutions using PySpark to transform raw data into a structured and usable format.
    • Apply data cleansing, enrichment, and validation techniques to ensure data quality and accuracy.
    • Create and maintain ETL processes using Python and PySpark to move and transform data between different systems.
    • Optimize ETL workflows for performance and efficiency.
    • Collaborate with data architects to design and implement data models that support business requirements.
    • Ensure data structures are optimized for analytics and reporting.
    • Work with distributed computing frameworks, such as Apache Spark, to process and analyze large-scale datasets.
    • Manage and optimize databases, both SQL and NoSQL, to support data storage and retrieval needs.
    • Implement indexing, partitioning, and other database optimization techniques.
    • Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver effective solutions.
    • Work closely with software engineers to integrate data solutions into larger applications.
    • Implement monitoring solutions to track data pipeline performance and proactively identify and address issues
    • Ensure compliance with data privacy regulations and company policies.
    • Stay abreast of industry trends and advancements in data engineering, Python, and PySpark.

    Requirements

    • Proficiency in Python and PySpark.
    • Strong knowledge of data engineering concepts and best practices.
    • Hands-on experience with AWS Glue and other AWS services.
    • Experience with big data technologies and distributed computing.
    • Familiarity with database management systems (SQL and NoSQL).
    • Understanding of ETL processes and data modeling.
    • Excellent problem-solving and analytical skills.
    • Strong communication and collaboration skills.
    • Bachelor’s degree in Computer Science, Information Technology, or a related field.