Tkxel

Principal Data Engineer

28 November 2024
Apply Now
Deadline date:
£28000 - £67000 / year

Job Description

Job Description:

We are seeking an experienced Data Engineer to the
design, development, and optimization of our client data infrastructure. This
role requires deep expertise in cloud technologies (primarily Azure, AWS is a plus) and data engineering best practices, with additional
experience in Apache Spark and Databricks for large-scale data
processing. The Data Engineer will work closely with data scientists, analysts,
and other stakeholders to create scalable and efficient data systems that
support advanced analytics and business intelligence. Additionally, this role involves
mentoring junior engineers and driving technical innovation within the data
engineering team.

Key Responsibilities:

  • Collaborate
    with Solution Architects: Work with Big Data Solution Architects to
    design, prototype, implement, and optimize data ingestion pipelines,
    ensuring effective data sharing across business systems.
  • ETL/ELT
    Pipeline Development: Build and optimize ETL/ELT pipelines and
    analytics solutions using a combination of cloud-based technologies, with
    an emphasis on Apache Spark and Databricks for large-scale
    data processing.
  • Data
    Processing with Spark: Leverage Apache Spark for distributed
    data processing, data transformation, and analytics at scale. Experience
    with Databricks for optimized Spark execution is highly desirable.
  • Production-Ready
    Solutions: Ensure data architecture, code, and processes meet
    operational, security, and compliance standards, making solutions
    production-ready in cloud environments.
  • Project
    Support & Delivery: Actively participate in project and production
    delivery meetings, providing technical expertise to resolve issues quickly
    and ensure successful project execution.
  • Database
    Management: Manage both SQL (e.g., PostgreSQL, MySQL) and NoSQL (e.g., DynamoDB, MongoDB) databases, ensuring data is efficiently stored,
    retrieved, and queried.
  • Real-Time
    Data Processing: Implement and maintain real-time data streaming
    solutions using tools such as Apache Kafka, AWS Kinesis, or
    other technologies for low-latency data processing.
  • Cloud
    Monitoring & Automation: Use monitoring and automation tools
    (e.g., AWS CloudWatch, Azure Monitor) to ensure efficient
    use of cloud resources and optimize data pipelines.
  • Data
    Governance & Security: Implement best practices for data
    governance, security, and compliance, including data
    encryption, access controls, and audit trails to meet regulatory
    standards.
  • Collaboration
    with Stakeholders: Work closely with data scientists, analysts, and
    business teams to align data infrastructure with strategic business
    objectives and goals.
  • Documentation:
    Maintain clear and detailed documentation of data models, pipeline
    processes, and system architectures to support collaboration and
    troubleshooting.

Requirements

  • 5+
    years of experience as a Data Engineer, with strong expertise in
    cloud-based data warehousing, ETL pipelines, and large-scale data
    processing.
  • Proficiency
    with cloud technologies, with experience in platforms like Azure or AWS.
  • Hands-on
    experience with Apache Spark for distributed data processing and
    transformation. Experience with Databricks is highly desirable.
  • Strong SQL skills and experience with relational databases (e.g., PostgreSQL, MySQL) as well as NoSQL databases (e.g., MongoDB, DynamoDB).
  • Proficient
    in Python for data processing, automation tasks, and building data
    workflows.
  • Experience
    with PySpark for large-scale data engineering, particularly in Spark
    clusters or Databricks.
  • Experience
    in designing and optimizing data warehouse architectures, ensuring
    optimal query performance in large-scale environments.
  • A
    strong understanding of data governance, security, and compliance best practices, including encryption, access control, and data privacy.

Preferred Qualifications:

  • Bachelor’s
    degree in Computer Science, Engineering, or a related field.
  • Certifications in Data Engineering from cloud providers (e.g., AWS Certified
    Big Data – Specialty, Microsoft Certified: Azure Data Engineer
    Associate) are a plus.
  • Experience
    with advanced data engineering tools and platforms such as Databricks, Apache Spark, or similar distributed computing technologies