Design, develop, and maintain robust and scalable data pipelines to ingest, process, and store structured and unstructured data from various sources.
Collaborate with data engineers, analysts, and business stakeholders to understand data requirements and translate them into technical solutions.
Implement ETL processes to transform raw data into formats suitable for analysis and reporting.
Optimize data pipelines for performance, efficiency, and reliability, considering factors such as data volume, velocity, and variety.
Monitor and troubleshoot data pipeline issues, ensuring timely resolution and minimal disruption to data workflows.
Implement data governance best practices to ensure data quality, integrity, and security across the organization.
Evaluate and implement new technologies, tools, and frameworks to enhance data processing capabilities and drive innovation.
Document data pipelines, processes, and workflows to facilitate knowledge sharing and collaboration within the team.
Stay current with industry trends and best practices in data engineering and analytics, applying new knowledge to improve existing systems and processes.
Qualifications:
Hands on experience in Azure Databricks, Azure Data Factory with 11+ years of IT experience.
Hands on experience in PySpark, python.
Skilled enough to provide an end to end solution for a data ingestion, data cleaning and aggregation of data(ETL process)
Experience in handling different types of data source (structured, semi-structured like Json, XML)
Good understand in snowflake. Experience in snowflake data engineering is added advantage.
Basic knowledge in Java is an added advantage.
Familiarity with data modeling, schema design, and data governance principles.
Excellent problem-solving skills and attention to detail.
Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment