l**************l
About Candidate
Education
Experiences
• Designed and implemented distributed ETL/ELT pipelines in GCP using Python and PySpark, leveraging DataProc clusters, Cloud Storage, and Kubernetes to process diverse legal and investigative data sources. • Built and maintained a scalable and secure Data Lake architecture to store structured and semi-structured data, enabling cross-team access, traceability, and advanced analytics. • Developed end-to-end analytical workflows for building ABTs (Analytical Base Tables), supporting downstream applications such as clustering, sentiment analysis, and pattern detection. • Applied LLM-based techniques, including regex parsing and semantic mapping, to extract insights from unstructured judicial data and correlate relevant information. • Designed and operationalized decision-making pipelines that integrate model outputs, NLP insights, and business rules into streamlined APIs for analytics and legal teams. • Participated in architectural design reviews and implemented best practices to ensure scalability, cost-effectiveness, and alignment with governance and security policies. • Built custom validation frameworks and monitoring dashboards using Looker Studio and Power BI for continuous data quality assurance and transparency.
• Mapped and analyzed data flows as part of a full migration from Azure (Data Factory, Synapse) to GCP (BigQuery, Cloud Storage). • Rebuilt pipelines using dbt for SQL transformations and Apache Airflow for orchestration. • Integrated CI/CD processes with Jenkins and Bitbucket to automate deployment and testing. • Validated and implemented business rules in collaboration with multidisciplinary teams, ensuring alignment with data governance and security standards. • Improved data quality, lineage tracking, and transformation efficiency through modular modeling and robust documentation. • Applied performance tuning and redundancy elimination techniques to optimize costs and processing time. using BigQuery, dbt, Airflow, and Cloud Storage. • Refactored transformations and implemented CI/CD automation via Jenkins and Bitbucket. • Validated business rules and ensured compliance with governance through modular dbt models and automated tests. • Improved data quality, lineage tracking, and pipeline efficiency post-migration.
• Led analysis of multisource health data (national and international) using Azure cloud services and semantic tools like Protégé, LogMap, and Neo4j. • Designed data architecture and pipelines using Azure Data Factory, Synapse, SQL, and Python to support AI-driven medical insights and ontology alignment.Built and managed centralized Data Lakes, ensuring governance and accessibility across teams. • Developed and deployed AI agents using state-of-the-art LLM techniques, semantic parsing, and advanced regex for medical knowledge extraction. • Collaborated with ML and data science teams to structure ABTs, enable inference workflows, and optimize model outputs. • Applied data quality frameworks and event-driven workflows using Azure-native tools such as Event Grid, DLP, and Data Catalog. using Dataflow and BigQuery to support real-time analytics. • Unified disparate data sources into a central Data Lake with standardized schema design. • Contributed to model optimization and ML inference flow with Data Science teams.
• Implementation of real-time data pipelines on Google Cloud Platform (GCP) to monitor and optimize robotic automation systems, ensuring operational efficiency and reduced downtime. • Development and deployment of predictive models using Python and SQL in BigQuery to detect anomalies in quality inspection processes and suggest proactive adjustments. • Structuring and ingestion of production data from robotic systems into a centralized Data Lake on GCP, enabling cross-functional analytics and traceability. • Creation of automated ETL workflows to transform data collected from industrial sensors, integrating it with business KPIs for real-time monitoring dashboards using Looker Studio and Power BI. • Collaboration with multidisciplinary teams to design and deliver scalable analytics solutions, driving continuous improvement and accelerating time-to-market for new product launches. • Transformation of operational data into actionable insights, achieving a 20% improvement in product quality through machine learning pipelines and statistical modeling.