Description

Description
Required:
6+ years in advanced SQL
6+ years in advanced Python
Scheduling Tool(s): Apache Airflow or Kubeflow (either will work)
Azure (preferred), GCP, or AWS advanced Public Cloud Experience
Proficiency in Python for data manipulation, automation, and scripting.
Advanced SQL skills for complex query writing, optimization, and database management.
Experience with Azure Kubernetes.
Experience with big data technologies (e.g., Spark, Hadoop) and data lake architectures.
Familiarity with CI/CD pipelines, version control (Git), and containerization (Docker) is a plus.
Key Responsibilities:
Data Pipeline Development:** Design, develop, and maintain robust ETL/ELT, curated and feature engineering processes using Python and SQL to extract, transform, and load data from various sources into our data platforms.
Database Management:** Optimize, manage, and monitor SQL databases and data warehouses, ensuring high performance and efficient data retrieval.
Data Integration:** Work with both structured and unstructured data sources to integrate diverse data sets into a unified, accessible format.
Data Quality and Governance:** Implement data quality checks, validation procedures, and data governance standards to maintain data accuracy and consistency.
Collaboration:** Collaborate with cross-functional teams, including data scientists, data analysts, software engineers, and product managers, to define data requirements and deliver solutions that meet business needs.
Performance Tuning:** Optimize performance of large-scale data processing systems and databases to ensure efficient data access and usage.
Documentation:** Create and maintain comprehensive documentation for data engineering processes, architectures, and pipelines.
Innovation:** Stay up-to-date with the latest trends and technologies in data engineering and cloud services, recommending and implementing new tools and techniques as appropriate.

Education

Any Graduate