Data Engineer with Python /Pyspark

16 March 2024

Remote

Apply Now

Deadline date:

£125000 - £194000

Develop and implement data pipelines for ingesting and collecting data from various sources into a centralized data platform.
Develop and maintain ETL jobs using AWS Glue services to process and transform data at scale.
Optimize and troubleshoot AWS Glue jobs for performance and reliability.
Utilize Python and PySpark to efficiently handle large volumes of data during the ingestion process.

Design and implement scalable data processing solutions using PySpark to transform raw data into a structured and usable format.
Apply data cleansing, enrichment, and validation techniques to ensure data quality and accuracy.

Create and maintain ETL processes using Python and PySpark to move and transform data between different systems.
Optimize ETL workflows for performance and efficiency.

Collaborate with data architects to design and implement data models that support business requirements.
Ensure data structures are optimized for analytics and reporting.

Work with distributed computing frameworks, such as Apache Spark, to process and analyze large-scale datasets.

Manage and optimize databases, both SQL and NoSQL, to support data storage and retrieval needs.
Implement indexing, partitioning, and other database optimization techniques.

Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver effective solutions.
Work closely with software engineers to integrate data solutions into larger applications.

Implement monitoring solutions to track data pipeline performance and proactively identify and address issues

Stay abreast of industry trends and advancements in data engineering, Python, and PySpark.

Requirements

Proficiency in Python and PySpark.
Strong knowledge of data engineering concepts and best practices.
Hands-on experience with AWS Glue and other AWS services.
Experience with big data technologies and distributed computing.
Familiarity with database management systems (SQL and NoSQL).
Understanding of ETL processes and data modeling.
Excellent problem-solving and analytical skills.
Strong communication and collaboration skills.
Bachelor’s degree in Computer Science, Information Technology, or a related field.

GE Vernova Solar & Storage Solutions Data Analytics Co-Op – Spring 2025