Description

Responsibilities:
Design and implement reliable data pipelines to integrate disparate data sources into a single Data Lakehouse
Design and implement data quality pipelines to ensure data correctness and building trusted datasets
Design and implement a Data Lakehouse solution which accurately reflects business operations
Assist with data platform performance tuning and physical data model support including partitioning and compaction
Provide guidance in data visualizations and reporting efforts to ensure solutions are aligned to business objectives
Qualifications:
5+ years of experience as a Data Engineer designing and maintaining data pipeline architectures
5+ years of programming experience in Python, ANSI SQL, PLSQL, and TSQL
Experience in various data integration patterns including ETL, ELT, Pub/Sub, and Change Data Capture
Experience with common Python Data Engineering packages including pandas, Numpy, Pyarrow Pytest, Scikit-Learn, and Boto3
Experience in software development practices such as Design Principles and Patterns, Testing, Refactoring, CI/CD, and version control
Experience in implementing a Data Lakehouse using Apache Iceberg or Delta Lake
Knowledgeable of modern data platform technologies including Apache Airflow, Kubernetes, and S3 Object Storage
Experience with Dremio and Airbyte is preferred

Education

Any Graduate