Description

Responsibilities: 
• Data Quality Champion: 
o Design and implement a data quality component within the data pipelines. 
o Develop and maintain data quality checks and alerts. 
o Monitor data quality metrics and investigate discrepancies. 

Data Catalog Architect: 
Create a comprehensive data catalog (metadata) encompassing all data structures within the data pipelines. 
Design and implement a system for storing and managing the data catalog. 

Genai Integration Specialist:
Develop APIs to enable data exchange between the data scheduler, observability system, and Genai's Copilot functionality. 
Ensure efficient data access for code generation and workflow understanding within Genai. 

Full-Stack Development: 
Develop and maintain code for data pipelines, APIs, and data quality components. 
Utilize both front-end and back-end development skills for a holistic data pipeline solution. 
Work closely with data scientists, analysts, and engineers to understand data needs and requirements. 
Communicate effectively to document processes and ensure team alignment. 
Skills and Technologies (Required): 
Programming Languages: Python (primary), familiarity with Java or Scala a plus 
Data Pipeline Tools: Airflow, Luigi, or similar workflow orchestration tools 
Data Quality Tools: Great Expectations, Apache Spark (data quality checks), or similar libraries 
Data Cataloging Tools: Apache Atlas, Collibrate, or similar data governance platforms 
API Development: Experience with RESTful APIs and API design principles 
Observability Tools: Familiarity with Prometheus, Grafana, or similar monitoring systems 
Cloud Platforms: Experience with AWS, GCP, or Azure a plus 
Skills and Technologies (Nice to Have): 
Experience with Genai or similar AI development platforms 
Machine Learning fundamentals 
Data warehousing experience (e.g., Snowflake, Redshift

 


 

Education

Any Graduate