Your expertise in Python, PySpark, ETL processes, CI/CD (Jenkins or GitHub), and experience with both streaming and batch workflows will be essential in ensuring the efficient flow and processing of data to support our clients
Collaborate with cross-functional teams to understand data requirements and design robust data architecture solutions
Implement ETL processes to extract, transform, and load data from various sources
Ensure data quality, integrity, and consistency throughout the ETL pipeline
Utilize your expertise in Python and PySpark to develop efficient data processing and analysis scripts
Optimize code for performance and scalability, keeping up-to-date with the latest industry best practices
Integrate data from different systems and sources to provide a unified view for analytical purposes
Collaborate with data analysts to implement solutions that meet their data integration needs
Design and implement streaming workflows using PySpark Streaming or other relevant technologies
Develop batch processing workflows for large-scale data processing and analysis
Bachelor's degree