PradeepIT Consulting Services Pvt Ltd
Oracle + PySpark Data Engineer (Remote)
Job Description
Job Description:
Experiance:5 to 7 Years
We are seeking a highly skilled and motivated Oracle + PySpark Data Engineer/Analyst to join our team. The ideal candidate will be responsible for leveraging the Oracle database and PySpark to manage, transform, and analyze data to support our business’s decision-making processes. This role will play a crucial part in maintaining data integrity, optimizing data processes, and enabling data-driven insights.
Key Responsibilities:
1. Data Integration: Integrate data from various sources into Oracle databases and design PySpark data pipelines to enable data transformation and analytics.
2. Data Transformation: Develop and maintain data transformation workflows
using PySpark to clean, enrich, and structure data for analytical purposes.
3. Data Modeling: Create and maintain data models within Oracle databases,
ensuring data is structured and indexed for optimal query performance.
4. Query Optimization: Write complex SQL queries and PySpark transformations for efficient data retrieval and processing.
5. Data Analysis: Collaborate with data analysts and business teams to provide insights through data analysis and reporting.
6. Data Quality: Implement data quality checks, error handling, and validation
processes to ensure data accuracy and reliability.
7. Performance Tuning: Optimize Oracle database and PySpark jobs to improve
Known Tools
- Proven experience in working with Oracle databases and PySpark.
Strong proficiency in SQL, PL/SQL, Python, and PySpark. - Familiarity with Oracle database administration, data warehousing, and ETL concepts.
- Understanding of big data technologies and distributed computing principles.
- Strong analytical and problem-solving skills.
- Excellent communication and teamwork abilities.
- Knowledge of data security and compliance standardsand overall data processing and analysis performance.
- Documentation: Create and maintain comprehensive documentation for data
models, ETL processes, and codebase