Unison Consulting Pte Ltd
Data Engineer (Spark Scala Elastic)
Job Description
Develop and implement data ingestion processes to acquire and ingest data from various sources into the data lake or warehouse.
Design and implement efficient Extract, Transform, Load (ETL) processes using Spark and Scala for large-scale data processing.
Create and maintain data models to organize and structure data for optimal storage and retrieval.
Implement data transformation logic to convert raw data into a usable format for analysis and reporting.
Ensure the quality and accuracy of data through data cleansing, validation, and error handling processes.
Optimize Spark and Scala code for performance, considering factors like data partitioning and parallel processing.
Implement workflow automation for scheduling and orchestrating data processing tasks using tools like Apache Airflow.
Develop solutions for real-time data processing and streaming using Spark Streaming and other relevant technologies.
Integrate data engineering processes with Elastic (Elasticsearch) for efficient data indexing, search, and analysis.
Collaborate with data scientists to understand their data requirements and implement solutions for effective data access.
Implement monitoring and logging mechanisms to track data processing workflows and identify issues promptly.
Create and maintain comprehensive documentation for data engineering processes, workflows, and code.
Implement and adhere to security measures to ensure data privacy and compliance with regulatory requirements.
Stay updated on industry trends, emerging technologies, and best practices in data engineering.
Plan for scalability by designing solutions that can handle increasing volumes of data.
Requirements
Bachelor’s degree in Computer Science, Data Engineering, or a related field.
Proven experience as a Data Engineer with expertise in Spark, Scala, and Elastic.
Proficient in Spark and Scala programming.
Experience with ElasticSearch and related technologies.
Strong understanding of distributed computing and big data concepts.
Knowledge of data modeling and optimization techniques.
Familiarity with workflow automation tools like Apache Airflow.
Excellent problem-solving and debugging skills..