Calix
Senior Data Engineer
Job Description
This role is based in Bangalore and follows a hybrid working model.
Job Description:
As a Senior Data Engineer, you will play a critical role in designing, building, and maintaining scalable data architectures that support machine learning models and analytics. You will work closely with data scientists, machine learning engineers, and other stakeholders to ensure that our data infrastructure meets the demands of our ML initiatives. Your expertise will drive the development of efficient data pipelines, enhance data quality, and optimize performance for our ML workloads.
Key Responsibilities:
- Design, develop, and maintain scalable data pipelines and ETL processes to support machine learning/GenAI applications.
- Collaborate with data scientists and ML engineers to understand data requirements and ensure seamless integration of data into ML models.
- Implement data governance practices to ensure data quality, consistency, and compliance with industry standards.
- Optimize data storage solutions for performance and cost-effectiveness, utilizing cloud services (AWS, Azure, or GCP).
- Monitor and troubleshoot data pipeline performance issues, implementing improvements as necessary.
- Develop and maintain documentation for data architecture, data flows, and processes to support knowledge sharing within the team.
- Stay updated on the latest data engineering and machine learning trends, tools, and best practices to continuously improve our processes and technologies.
- Mentor and provide guidance to junior data engineers and other team members.
- Perform data ingestion, data processing and feature engineering tasks.
- Operate and administration of production DB: SQL, NoSQL, Vector and Graph.
Qualifications:
- Bachelor’s or master’s degree in computer science, Engineering, Data Science, or a related field.
- 5+ years of experience as a Data Engineer, with a focus on machine learning applications.
- Strong programming skills in Python, Java, or Scala.
- Strong experience with data processing frameworks (e.g., Apache Spark, Apache Kafka) and ETL tools.
- Solid understanding of databases (SQL and NoSQL) and data warehousing solutions (e.g., Amazon Redshift, Google BigQuery).
- Experience with real-time data processing and streaming analytics.
- Experience with containerization and orchestration tools (Docker, Kubernetes)
- Experience with cloud platforms (AWS, Azure, GCP) and their data services.
- Experience with version control systems (e.g., Git) and CI/CD pipelines.
- Knowledge of machine learning concepts and algorithms is a plus.
- Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
- Strong communication skills, with an ability to convey complex technical concepts to non-technical stakeholders.
Preferred Qualifications:
- Knowledge of data governance and compliance regulations.
- Knowledge of machine learning concepts and algorithms is a plus.
- Familiarity with MLOps practices and tools.