Description

Description:

Design, develop, and maintain scalable data pipelines using Apache Spark and other ETL tools.
Implement data processing workflows using Apache Airflow to ensure efficient data ingestion and transformation.
Collaborate with data scientists and analysts to understand data requirements and deliver high-quality datasets.
Write efficient data transformation scripts in Scala and Python to process large datasets.
Manage data storage solutions on AWS, including S3, Redshift, and other AWS services.
Develop and implement CI/CD pipelines to automate data deployment and improve code quality.
Monitor data pipelines and troubleshoot issues to ensure optimal performance and reliability.
Document data architecture and workflows to maintain clear communication within the team.

Education

Bachelor's degree