Groov
Senior Data Engineer
Job Description
About Groov
Groov is a Workplace Science & Analytics platform that is on a mission to make workplaces better by applying science to workplace data to generate and deliver actionable insights in the flow of work for all people in an organization: individual contributors, managers and leaders. These insights are tailored to the organization’s culture, structure and strategy so that they can effectively improve productivity, morale and job satisfaction. Our team of industrial and organizational psychologists, workplace scientists, data scientists, engineers, product specialists and user experience experts are building the future of workplace science and analytics together.
We work with enterprises to understand their workplace problems and develop strategies and solutions that take advantage of opportunities to improve their core business. Key to this endeavor is the use of large dynamic data sets of passive and active data to build Statistical, Machine Learning and Artificial Intelligence models grounded in cutting edge science, to develop insights and interventions and then test, learn and optimize the efficacy of these insights and interventions. Groov has demonstrated the power that real time actionable insights can have on workplaces to improve both performance and employee care.
Role Overview
We are looking for an experienced Senior Data Engineer to be part of our team and play a critical role in designing, implementing, and maintaining Groov’s scalable and robust data infrastructure. This role is essential to empowering our data scientists, workplace scientists, and engineers with high-quality, reliable data pipelines and workflows. You will partner with cross-functional teams to ensure data is accessible, clean, and actionable, enabling Groov to deliver dynamic, real-time insights to our customers effectively.
This role provides an exciting opportunity to define the backbone of Groov’s data infrastructure, enabling cutting-edge workplace science and analytics.
Key Responsibilities
- Architect Scalable Data Systems:
- Design, develop, and maintain scalable data pipelines and workflows for efficient data ingestion, transformation, and storage.
- Establish and maintain data environments for development, testing, and production to ensure seamless deployment.
- Enable Data Quality and Lineage:
- Implement data validation frameworks and ensure schema consistency across datasets.
- Set up tools for tracking data lineage, dependency management, and version control for reliable data workflows.
- Support Cross-Functional Teams:
- Collaborate with data scientists to define and implement data models for use in AI/ML systems.
- Partner with product and engineering teams to ensure seamless integration of data systems with Groov’s applications and features.
- Evaluate and Integrate Tools and Platforms:
- Research and recommend managed data platforms and tools, such as Snowflake, Databricks, Fivetran, and dbt, to optimize data workflows.
- Enhance Groov’s AWS-based data stack, leveraging tools like Glue, Athena, S3, and Step Functions.
- Optimize Data Processes:
- Automate data pipeline testing and deployment through CI/CD workflows.
- Develop efficient ETL processes to support real-time and batch data workflows.
- Monitor and Improve Scalability:
- Identify bottlenecks and opportunities for optimization in existing pipelines.
- Ensure Groov’s data infrastructure is cost-effective, scalable, and reliable as data volume and complexity grow.
Basic Qualifications
- Bachelor’s degree in Computer Science, Data Engineering, or a related field.
- 5+ years of experience in data engineering or related roles.
- Experience with managing data lakes.
- Experience with evaluating and implementing managed data platforms like Snowflake or Databricks to optimize storage, querying, and analytics workflows?
- Proficiency in Python and SQL for developing ETL pipelines and data workflows.
- Experience with AWS services (Glue, Athena, S3) for data transformation and querying.
- Strong background in designing and maintaining scalable data architectures.
- Has experience building and maintaining tenant-isolated views in a shared data warehouse to support a multi-tenant application architecture.
- Experience implementing data validation, lineage tracking, and environment separation for data systems.
- Knowledge of CI/CD workflows and version control tools like Git.
Preferred Qualifications
- Master’s degree in Computer Science, Data Engineering, or a related field.
- 7+ years of experience in data engineering or related roles.
- Experience with big data tools (e.g., Apache Spark, Kafka, Hadoop) and transformation tools (e.g., dbt, Airflow).
- Proven ability to collaborate with cross-functional teams to define data models and drive data-driven decision-making.
- Knowledge of MLOps tools (e.g., MLflow, SageMaker) and deploying ML models in production environments.