Job Description:
Responsibilities:
Data Pipeline Development: Design, build, and maintain scalable ETL pipelines to ensure efficient data flow from diverse sources into data warehouses.
Workflow Automation: Utilize Apache Airflow for scheduling and managing complex data workflows.
Collaboration: Partner with data scientists to implement machine learning models (MLOps) in production.
Coding and Scripting: Write high-quality Python code for data extraction, transformation, and loading tasks.
Big Data Processing: Use Databricks for processing large datasets and optimizing data workflows.
Data Quality Management: Ensure data accuracy and integrity by implementing monitoring, logging, and alerting mechanisms.
Version Control: Use GitHub for code management and collaboration, maintaining clear documentation of data processes.
Database Management: Work with Snowflake and BigQuery for data storage and analytics, ensuring optimal performance and scalability.
Performance Optimization: Analyze and tune data storage solutions and SQL queries for efficiency.
Cross-Functional Support: Collaborate with business units to identify and fulfill data needs effectively.
Basic Qualifications:
3+ years of experience in data engineering or a related field.
Proficient in Python for data processing and scripting.
Experience with Apache Airflow or similar workflow management tools.
Familiarity with machine learning operations (MLOps) and model deployment.
Solid understanding of cloud platforms (AWS, Azure, GCP).
Strong SQL skills and experience with data modeling.
Practical experience with Snowflake and BigQuery.
Preferred Skills:
Knowledge of Databricks for data processing and analytics.
Experience with data visualization tools like Tableau or Power BI.
Familiarity with containerization technologies such
Bachelor's degree in Computer Science