Fusemachines

Mid-Level Data Engineer AWS Snowflake

1 June 2024
Apply Now
Deadline date:
£90000 - £158000 / year

Job Description

About Fusemachines

Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, Ph.D., an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. With a robust presence in four countries and a dedicated team of over 400 full-time employees, we are committed to fostering AI transformation journeys for businesses worldwide. At Fusemachines, we not only bridge the gap between AI advancement and its global impact but also strive to deliver the most advanced technology solutions to the world.

About the role:

This is a fully remote, contract position, working in the beauty and personal care industry, specialized in Digital Marketing, responsible for designing, building, testing and maintaining the infrastructure and code required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics) from ingestion to consumption.

We are looking for a skilled Data Engineer with a strong background in Python, Snowflake, SQL, Pyspark and AWS cloud-based large scale data solutions with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, leading and collaborating with business and technology stakeholders to establish and maintain the enterprise data architecture.

Qualification & Experience:

  • Must have a full-time Bachelor’s degree in Computer Science or similar
  • At least 2 years of experience in technology consulting, in data architecture solutions and frameworks.
  • At least 3 years of experience as a data engineer with strong expertise in Python, SQL, PySpark and AWS.
  • 3+ years of experience with GitHub and Jenkins.
  • Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer.

Following certifications:

  • AWS Certified Cloud Practitioner
  • AWS Certified Data Engineer – Associate
  • SnowPro Core Certification

Nice to have:

  • SnowPro Advanced Data Engineer
  • Databricks Certified Associate Developer for Apache Spark
  • Databricks Certified Data Engineer Associate

Required skills/Competencies

  • Strong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, storage, processing and manipulation.
  • Strong SQL skills and experience working with complex data sets, Enterprise Data Warehouse and writing advanced SQL queries. Proficient with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (Cassandra, MongoDB, Neo4j, etc.).
  • Proven experience as a Snowflake Developer, with a strong understanding of Snowflake architecture and concepts.
  • Proficient in snowflake services such as snowpipe, stages, stored procedures, views, materialized views, tasks and streams.
  • Robust understanding of data partitioning and other optimization techniques in Snowflake.
  • Knowledge of data security measures in Snowflake, including role-based access control (RBAC) and data encryption.
  • Thorough understanding of big data principles, techniques, and best practices.
  • Strong experience with scalable and distributed Data Processing Technologies such as Spark/PySpark (must have), to be able to handle large volumes of data.
  • Strong experience in designing and implementing efficient ELT/ETL processes being able to develop custom integration solutions as needed.
  • Skilled in Data Integration from different sources such as APIs, databases, flat files, etc.
  • Expertise in data cleansing, transformation, and validation.
  • Strong understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.
  • Strong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in AWS.
  • Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies.
  • Knowledge of SDLC tools and technologies, including project management software (Jira or similar), source code management (GitHub, GitLab or similar), CI/CD system (GitHub actions, Jenkins or similar) and binary repository manager (Sonatype Nexus or similar). Following tools are a plus: Maven, Sonar/JSHint/jstanbul, Selenium, TestLink, RunDeck, Jmeter, JUnit, Cucumber.
  • Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform), configuration management, automated testing, performance tuning and cost management and optimization.
  • Deep knowledge in cloud computing specifically in AWS services related to data and analytics, such as Glue, SageMaker, Redshift, Lambda, Kinesis, S3, Lake Formation, EC2, ECS/ECR, EKS, IAM, CloudWatch, etc
  • Experience in Orchestration using technologies like Databricks workflows and Apache Airflow.
  • Strong knowledge of data structures and algorithms and good software engineering practices.
  • Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures.
  • Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.
  • Good understanding of Data Quality and Governance, including implementation of data quality and integrity checks and monitoring processes to ensure that data is accurate, complete, and consistent.
  • Good Problem-Solving skills: being able to troubleshoot data processing pipelines and identify performance bottlenecks and other issues.
  • Excellent communication skills to collaborate with cross-functional teams, including business users, data architects, DevOps/DataOps/MLOps engineers, data analyst, data scientists, developers, and operations teams. Essential to convey complex technical concepts and insights to non-technical stakeholders effectively.
  • Ability to document processes, procedures, and deployment configurations.
  • Understanding of security practices, including network security groups, encryption, and compliance standards.
  • Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them.
  • Self-motivated with the ability to work well in a team.
  • A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field.
  • Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements.
  • Care about architecture, observability, testing, and building reliable infrastructure and data pipelines.
  • Experience in defining data management processes with the ability to lead an independent project.
  • Spoken and written English (Intermediate-advanced level).
  • Spoken and written Spanish (Advanced level).

Responsibilities:

  • Design, implement, and maintain scalable and efficient data architectures, defining and maintaining standards and best practices for data management.
  • Handle ELT/ETL processes, including data extraction, loading and transformation, from different sources ensuring consistency and quality.
  • Transform and clean data for further analysis and storage.
  • Design and optimize data models and schemas to support business requirements and analysis.
  • Implement monitoring tools and systems to ensure the availability and performance of data systems.
  • Manage data security and access, ensuring confidentiality and integrity.
  • Automate repetitive tasks and processes to improve operational efficiency.
  • Collaborate with data science teams to establish pipelines and workflows for training, validation, deployment, and monitoring of machine learning models. Automate deployment and management of machine learning models in production environments.
  • Contribute to data quality assurance efforts, such as implementing data validation checks and tests to ensure accuracy, completeness and consistency of data.
  • Test software solutions and meet product quality standards prior to release to QA.
  • Ensure the reliability, scalability, and efficiency of data systems are maintained at all times. Identifying and resolving performance bottlenecks in pipelines due to data, queries and processing workflows to ensure efficient and timely data delivery.
  • Work with DevOps teams to optimize resources.
  • Assist in the configuration and management of data warehousing (including Snowflake) and data lake solutions.
  • Collaborate closely with cross-functional teams including Product, Engineering, Data Scientists, and Analysts to thoroughly understand data requirements and provide data engineering support.
  • Takes ownership of storage layer, database management tasks, including schema design, indexing, and performance tuning.
  • Evaluate and implement cutting-edge technologies and methodologies and continue learning and expanding skills in data engineering and cloud platforms,  to improve and modernize existing data systems.
  • Develop, design, and execute data governance strategies encompassing cataloging, lineage tracking, quality control, and data governance frameworks that align with current analytics demands and industry best practices working closely with Data Architect.
  • Ensure technology solutions support the needs of the customer and/or organization.
  • Define and document data engineering architectures, processes and data flows.

Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.