Responsible to create and maintain optimal data pipeline architecture,
Assemble large, complex data sets that meet functional / non-functional business requirements.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and Azure ‘big data’ technologies.
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Responsible for the development and modification of Azure Data Pipelines, Spark Python / SQL jobs and ETL processes for Enterprise Data Warehouse; Should be an expert in ETL and DW concepts
Proven track record in troubleshooting Azure cloud services and addressing production issues like performance tuning and enhancement.
Requirements
Strong experience in building and managing pipelines using Azure DataFactory, Databricks, Azure Datalake.
Strong experience with Spark/Pyspark, Python, SQL.
Good understanding of Azure Cloud services in terms of performance tuning, error handling.
Hands on experience in managing MLOPS pipelines.
Experience with data governance, master data management and data catalog solutions.
Exposure Trino/Presto, Docker and Kubernetes would be a plus.
Strong analytical skills and good understanding of data structures and algorithms.
Bachelor's degree