Allegro
Data Engineer – Machine Learning Product Catalogue
Job Description
Job Description
The salary range for this position is (contract of employment):
mid: 12 300 – 17 600 PLN in gross terms
senior: 16 100 – 23 200 PLN in gross terms
A hybrid work model that incorporates solutions developed by the leader and the team
We are looking for a Data Engineer with a focus on the data processing and preparation, deployment and maintenance of our ML/data projects. Join our team to enhance your skills related to deploying data-based processes, MLOps Machine Learning approaches and share the skills within the team.
We are looking for people who have:
- 2+ years hands-on experience in Python and its data processing toolset (pandas, NumPy)
- Experience in process/solution monitoring
- Knowledge and experience in processing large datasets with Big Data tools, especially Spark (PySpark)
- Proficiency in using development tools (git, issue tracking, pull requests, code reviews etc.), familiarity with software engineering best practices (PEP8, code review, documentation, CI/CD, testing, automation etc.)
- DevOps experience
- Experience in writing advanced and efficient SQL queries (especially in GCP/BigQuery environment)
- Experience in working on cloud solutions and architecture (GCP, AWS, Azure)
- Understanding of AI related concepts (classification vs clustering, modeling, precision/recall metrics, model evaluation etc.) and demonstrated ability to use those metrics to back up assumptions and evaluate outcomes
- Positive attitude and ability to work in a team
- Good communication skills and pro-activity in seeking, clarifying and understanding information from end users and stakeholders
An additional advantage would be:
- Previous experience in building, evaluating or deploying ML/AI-based solutions
- Knowledge of ML libraries (sklearn, xgboost, lgbm)
- MLOps practical experience
- Previous experience with GCP tools for data processing e.g. BigQuery, Dataproc etc. and workflow automation solutions, e.g. Airflow
- GCP certifications and/or hand-on experience in GCP including ML/AI tools (vertex AI)
Our techstack:
- Python, BigQuery SQL, Spark
- Google Cloud Platform (Airflow, BigQuery, Composer)
- GitHub (code storage, CI/CD, hosting our own Data Science Python library)
What we offer:
- A hybrid work model that you will agree on with your leader and the team. We have well-located offices (with fully equipped kitchens and bicycle parking facilities) and excellent working tools (height-adjustable desks, interactive conference rooms)
- Annual bonus up to 10% of the annual salary gross (depending on your annual assessment and the company’s results)
- A wide selection of fringe benefits in a cafeteria plan – you choose what you like (e.g. medical, sports or lunch packages, insurance, purchase vouchers)
- English classes that we pay for related to the specific nature of your job
- Working in a team you can always count on — we have on board top-class specialists and experts in their areas of expertise
- A high degree of autonomy in terms of organizing your team’s work; we encourage you to develop continuously and try out new things
- Hackathons, team tourism, training budget and an internal educational platform, MindUp (including training courses on work organization, means of communications, motivation to work and various technologies and subject-matter issues)
- A 16″ or 14″ MacBook Pro with M1 processor and, 32GB RAM or a corresponding Dell with Windows (if you don’t like Macs) and other gadgets that you may need
What will your responsibilities be?
- You will be actively responsible for building data processing tools for modeling, analysis and ML – in close cooperation with Data Science team
- You will be supporting Data Science team in the development of data sources for ad-hoc analyses and Machine Learning projects
- You will process terabytes of data using Google Cloud Platform BigQuery, Composer, Dataflow and PySpark as well as optimize processes in terms of their performance and GCP cloud processing costs
- You will collect process requirements from project groups and automate tasks related to preprocessing and data quality monitoring, prediction serving, as well as Machine Learning model monitoring, alerting and retraining
- You will be responsible for the engineering quality of each project and you will cooperate with your colleagues on the engineering excellence
Why is it worth working with us?
- Through the supplied data and processes, you will have a meaningful impact on the operation of one of the largest e-commerce platforms in the world
- Thanks to the wide range of projects we are involved in, you will never be without an interesting challenge to take on
- You will have access to vast datasets (measured in petabytes)
- You will get a chance to work in a team of experienced engineers and BigData specialists who are willing to share their knowledge (incl. with the general public, as part of allegro.tech)
- Your professional growth will follow the most recent open-source technological trends
- You will have an actual impact on the directions of product development and on the selection of particular technologies – we use the most recent and best technological solutions available, because we align them closely with our needs
- We are a full-stack provider – we design, code, test, deploy and maintain our solutions
Send us your CV and learn why it’s #goodtobehere