Responsibilities
Data Lake and Warehouse Architecture : Design, implement, and maintain data lakes and data warehouses, ensuring high availability, performance, and scalability.
ETL Development : Lead the development of ETL pipelines for batch processing as well as real-time streaming data integration to support the organization's data needs.
Data Engineering Best Practices : Establish and enforce best practices for data engineering, data security, and data quality management.
AWS Cloud Expertise : Leverage AWS tools and services (e.g. , Redshift, S3 Lambda, Dynamo) for data storage, data transformation, and efficient data movement.
PySpark and Big Data Processing : Architect and optimize data processing workflows using PySpark to support data lake architecture, ensuring efficient data ingestion, transformation, and storage.
Cross-functional Collaboration : Work closely with data scientists, analysts, and stakeholders to understand data requirements and support business objectives.
Team Leadership : Mentor junior engineers and foster a collaborative, high-performance engineering culture.
Requirements
Experience : 5-7 years of experience in data engineering or related fields with hands-on experience in data lake and data warehouse architecture.
Technical Skills
Proficiency with AWS services (Redshift, S3 Lambda, etc.
Experience with other AWS big data tools (e.g. , EMR, Kinesis, Firehose, Glue).
Strong experience with PySpark for data lake management and data transformation.
Hands-on expertise in developing ETL pipelines for both batch and real-time processing.
Management experience with Data Engineering tools such as Apache Superset, Apache Airflow, Apache Spark, Jupyterhub, etc.
Soft Skills : Excellent problem-solving skills, communication skills, and ability to work in a fast-paced, agile environment
Bachelor's degree in Computer Science