ASAPP

Staff Data Engineer

28 May 2024
Apply Now
Deadline date:
£45000 - £84000 / year

Job Description

Join our team at ASAPP, where we’re developing transformative Vertical AI designed to improve customer experience. Recognized by Forbes AI 50, ASAPP designs generative AI solutions that transform the customer engagement practices of Fortune 500 companies. With our automation and simplified work processes, we empower people to reach their full potential and create exceptional experiences for everyone involved. Work with our team of talented researchers, engineers, scientists, and specialists to help solve some of the biggest and most complex problems the world is facing.
The Data Engineering & Analytics team (DEA) at ASAPP powers the core of our data and analytics products. ASAPP’s products are based on natural language processing and serve tens of millions of end-users in real time. We need sophisticated metrics to monitor and continuously improve our systems. We are seeking a Staff Data Engineer to serve as both a technical leader and a core individual contributor by designing and building analytic data feeds for both our business partners and internal stake holders. Applicants with all or some relevant combination of the requirements listed below are encouraged to apply. This is a hybrid role, with a preference for candidates in proximity to our Bangalore Office in Bellandur.

What you’ll do

  • Lead the batch analytics team by providing the groundwork to modernize our data analytics architecture
  • Design and maintain our data warehouse to facilitate analysis across hundreds of systems events
  • Rethink and influence strategy and roadmap for building efficient data solutions and scalable data warehouses
  • Review code for style and correctness across the entire team
  • Write production-grade Redshift, Athena, Snowflake & Spark SQL queries
  • Manage and maintain Airflow ETL jobsTest query logic against sample scenarios
  • Work across teams to gather requirements and understand reporting needs
  • Investigate metric discrepancies and data anomalies
  • Debug and optimize queries for other business units
  • Review schema changes across various engineering teams
  • Maintain high-quality documentation for our metrics and data feeds 
  • Work with stakeholders in Data Infrastructure, Engineering, Product and Customer Strategy to assist with data-related technical issues and build scalable cross platform reporting framework
  • Participate in, and co-manage our on-call rotation to keep production pipelines up and running

What you’ll need

  • Technical requirements7+ years industry experience with clear examples of strategic technical problem solving and implementation
  • Expertise in at least one flavor of SQL. (We use Amazon Redshift, MySQL, Athena and Snowflake)
  • Strong experience with data warehousing (e.g. Snowflake (preferred), Redshift, BigQuery, or similar)
  • Experience with dimensional data modeling and schema design
  • Experience using developer-oriented data pipeline and workflow orchestration (e.g. Airflow (preferred), dbt, dagster or similar)
  • Experience with cloud computing services (AWS (preferred), GCP, Azure or similar)
  • Proficiency in a high-level programming language, especially in terms of reading and comprehending other developers’ code and intentions. (We use Python, Scala, and Go)
  • Deep technical knowledge of data exchange and serialization formats such as Protobuf, YAML, JSON, and XML
  • Familiarity with BI & Analytics tools (e.g. Looker, Tableau, Sisense, Sigma computing or similar)
  • Familiarity with streaming data technologies for low-latency data processing (e.g. Apache Spark/Flink, Apache Kafka, Snowpipe or similar)
  • Familiarity with Terraform, Kubernetes and Docker
  • Understanding of modern data storage formats and tools (e.g. parquet, Avro, Delta Lake)
  • Knowledge of modern data design and storage patterns (e.g. incremental updates, partitioning and segmentation, rebuilds and backfills)

Professional Requirements

  • Experience working at a startup preferred
  • Excellent English and strong communication skills – (Slack/Email/Documents)
  • Experienced with end user management & communication (cross team as well as external)
  • Must thrive in a fast paced environment and be able to work independently with urgency
  • Can work effectively remotely (able to be proactive about managing blockers, proactive on reaching out and asking questions, and participating in team activities)
  • Experienced in writing technical data design docs (pipeline design, dataflow, schema design)
  • Can scope and breakdown projects, communicate and collaborate progress and blockers effectively with your manager, team, and stakeholders
  • Good at task management & capacity tracking (JIRA (preferred))

Nice to haves

  • Degree in a quantitative discipline such as computer science, mathematics, statistics, or engineering
  • Experience working with entity data (entity resolution / record linkage)
  • Experience working with data acquisition / data integrationExpertise with Python and the Python data stack (e.g. numpy, pandas, pyspark)
  • Experience with streaming platforms (e.g. Kafka)
  • Experience evaluating data quality and maintaining consistently high data standards across new feature releases (e.g. consistency, accuracy, validity, completeness)

About the Team/Role

  • DEA Team – Responsible for building and delivering analytics for the company to both internal users and external customers. This involves a few different areas handled by different sub-teams. This role aligns with the Business Intelligence and Batch analytics team -This sub-team takes events that land in our data warehouse and transforms them into analytic reporting tables, building many intermediate tables in between. This is effectively our analytics warehouse. They also handle the delivery of these reporting tables. Tables are regularly synced to a MySQL instance which powers the dashboards. 
  • Responsibilities of this role – This person hired in this role will be designing, leading, and developing the data warehouse redesign. This will involve learning all of our event schemas, required transformations, reporting grains, timeliness, and all of the internal quirks that make analytics so challenging. As part of the redesign, there will be opportunities to move some transformations into the realtime pipelines. This would drastically simplify the batch pipelines and also enable real time delivery to the customers in the future.