Allegro

Data Engineer – Machine Learning Product Catalogue

26 March 2024
Apply Now
Deadline date:
£86000 - £154000 / year

Job Description

Job Description

The salary range for this position is (contract of employment):

mid: 12 300 – 17 600 PLN in gross terms

senior: 16 100 – 23 200 PLN in gross terms

A hybrid work model that incorporates solutions developed by the leader and the team

We are looking for a Data Engineer with a focus on the data processing and preparation, deployment and maintenance of our ML/data projects. Join our team to enhance your skills related to deploying data-based processes, MLOps Machine Learning approaches and share the skills within the team.

We are looking for people who have:

  • 2+ years hands-on experience in Python and its data processing toolset (pandas, NumPy)
  • Experience in process/solution monitoring
  • Knowledge and experience in processing large datasets with Big Data tools, especially Spark (PySpark)
  • Proficiency in using development tools (git, issue tracking, pull requests, code reviews etc.), familiarity with software engineering best practices (PEP8, code review, documentation, CI/CD, testing, automation etc.)
  • DevOps experience
  • Experience in writing advanced and efficient SQL queries (especially in GCP/BigQuery environment)
  • Experience in working on cloud solutions and architecture (GCP, AWS, Azure)
  • Understanding of AI related concepts (classification vs clustering, modeling, precision/recall metrics, model evaluation etc.) and demonstrated ability to use those metrics to back up assumptions and evaluate outcomes
  • Positive attitude and ability to work in a team
  • Good communication skills and pro-activity in seeking, clarifying and understanding information from end users and stakeholders

An additional advantage would be:

  • Previous experience in building, evaluating or deploying ML/AI-based solutions
  • Knowledge of ML libraries (sklearn, xgboost, lgbm)
  • MLOps practical experience
  • Previous experience with GCP tools for data processing e.g. BigQuery, Dataproc etc. and workflow automation solutions, e.g. Airflow
  • GCP certifications and/or hand-on experience in GCP including ML/AI tools (vertex AI)

Our techstack:

  • Python, BigQuery SQL, Spark
  • Google Cloud Platform (Airflow, BigQuery, Composer)
  • GitHub (code storage, CI/CD, hosting our own Data Science Python library)

What we offer:

  • A hybrid work model that you will agree on with your leader and the team. We have well-located offices (with fully equipped kitchens and bicycle parking facilities) and excellent working tools (height-adjustable desks, interactive conference rooms)
  • Annual bonus up to 10% of the annual salary gross (depending on your annual assessment and the company’s results)
  • A wide selection of fringe benefits in a cafeteria plan – you choose what you like (e.g. medical, sports or lunch packages, insurance, purchase vouchers)
  • English classes that we pay for related to the specific nature of your job
  • Working in a team you can always count on — we have on board top-class specialists and experts in their areas of expertise
  • A high degree of autonomy in terms of organizing your team’s work; we encourage you to develop continuously and try out new things
  • Hackathons, team tourism, training budget and an internal educational platform, MindUp (including training courses on work organization, means of communications, motivation to work and various technologies and subject-matter issues)
  • A 16″ or 14″ MacBook Pro with M1 processor and, 32GB RAM or a corresponding Dell with Windows (if you don’t like Macs) and other gadgets that you may need

What will your responsibilities be?

  • You will be actively responsible for building data processing tools for modeling, analysis and ML – in close cooperation with Data Science team
  • You will be supporting Data Science team in the development of data sources for ad-hoc analyses and Machine Learning projects
  • You will process terabytes of data using Google Cloud Platform BigQuery, Composer, Dataflow and PySpark as well as optimize processes in terms of their performance and GCP cloud processing costs
  • You will collect process requirements from project groups and automate tasks related to preprocessing and data quality monitoring, prediction serving, as well as Machine Learning model monitoring, alerting and retraining
  • You will be responsible for the engineering quality of each project and you will cooperate with your colleagues on the engineering excellence

Why is it worth working with us?

  • Through the supplied data and processes, you will have a meaningful impact on the operation of one of the largest e-commerce platforms in the world
  • Thanks to the wide range of projects we are involved in, you will never be without an interesting challenge to take on
  • You will have access to vast datasets (measured in petabytes)
  • You will get a chance to work in a team of experienced engineers and BigData specialists who are willing to share their knowledge (incl. with the general public, as part of allegro.tech)
  • Your professional growth will follow the most recent open-source technological trends
  • You will have an actual impact on the directions of product development and on the selection of particular technologies – we use the most recent and best technological solutions available, because we align them closely with our needs
  • We are a full-stack provider – we design, code, test, deploy and maintain our solutions

Send us your CV and learn why it’s #goodtobehere