Description

Job Description:

1. Purpose of the Job – A simple statement to clearly identify the job's objective.

• The Tech Lead / Senior Data Engineer will lead and contribute to the design, development, maintenance, and evolution of AI and Data Products.

• The position is highly technical and aligns with delivering business value as you will be working within an AI and Data Product team.

• From day #1, he will participate in the design, development, and evolution of data pipeline, aiming at serving data to data consumers part of the product team.

• Tech Lead / Senior Data Engineer reports to the Engineering Manager who reports to the Global VP Engineering

2. Key Responsibilities and Expected Deliverables– This details what actually needs to be done; the duties and expected outcomes.

Tech Lead

• Privileged technical point of contact for the Product Owner and business sponsors

• Ability to take data engineering responsibility of a data project

• Listening skills and very good communication skills

• Ability to manage technical debt (variable according to use cases)

• Technical architect, lead design meetings and tech breakdowns

• Ability to mentor, guide the work of a more junior Data Engineer

Data Ingestion & Processing:

• Design and implement batch and real-time data ingestion pipelines.

• Use Databricks (with PySpark) for big data processing tasks.

• Use DBT to transform data.

• Cleanse, transform, and enrich raw data to make it analytics ready.

• Optimize queries and data processing for speed and cost efficiency.

Data Storage & Management:

• Design and implement database schemas, tables, and views.

• Optimize storage formats for querying, such as Parquet or Delta Lake.

• Enforce data quality checks and data lineage documentation.

• Implement partitioning, bucketing, and indexing strategies for efficient data retrieval.

Collaboration with data experts (data analyst, data scientists):

• Work closely with data scientists to provide data in appropriate formats for machine learning and advanced analytics.

• Collaborate with Platform Engineer teams to comply with Platform good practices and escalate common needs

• Assist data analysts with SQL queries, views, and report generation.

• Collaborate on the deployment of machine learning models to production.

Security & Compliance (with Azure security experts):

• Implement role-based access controls and data encryption (at-rest and in-transit).

• Comply with industry and organizational data standards, privacy regulations, and best practices.

• Regularly audit data access and usage.

Infrastructure & Configuration (with Azure infrastructure experts):

• Set up and maintain Azure cloud infrastructure.

• Configure and optimize Azure Data Lake Storage, Blob Storage, and other Azure storage solutions.

• Deploy and manage Databricks clusters for processing tasks.

• Implement and maintain data pipelines using Azure Data Factory.

• Monitor and troubleshoot infrastructure-related issues.

Documentation & Training:

• Onboard new team members, providing access and initial training on tools.

• Create documentation and knowledge bases for data pipelines, best practices, and tooling.

Continuous Improvement:

• Stay updated with the latest advancements in data engineering technologies.

• Propose and implement optimizations for current workflows and systems.

• Proactively identify areas of improvement and automation.

• Regularly update team on new features or changes in Azure, Databricks, or related technologies.

EDUCATION

• Engineering Master’s degree or PhD

• 5 years experience in a Data engineering role into large corporate organizations

• Experience in a Data/AI environment into a Cloud ecosystem

 

HARD SKILLS

Software Engineering & SQL:

• Python: Strong coding skills in Python, especially with libraries related to data manipulation (e.g., Pandas) and interfacing with databases.

• Python ecosystem: Strong knowledge of tooling in Python ecosystem such as dependency management tooling (poetry, venv)

• Software engineering practices: Strong experience in putting in place good software engineering practices such as design patterns, testing (unit, integration, e2e), clean code, code formatting etc.

• SQL: Advanced knowledge of SQL for data querying, transformation, and aggregation.

Data Architecture:

• Design: Ability to design scalable and robust data pipelines considering functional and non-functional requirements.

• Integration: Knowledge of data architectures to ensure reliable and efficient data flow.

Cloud Platforms:

• Azure Services: Proficiency in Azure Data Lake, Azure Data Factory, Azure Blob Storage, Azure SQL Database, and other related Azure services.

• Azure Databricks: Proficiency in using Databricks for deploying & running Spark jobs including advanced usage (such as handling clusters, secret scopes, warehouses, unity catalog etc.)

• Cloud Infrastructure: Familiarity with setting up, configuring, and managing virtual networks, VMs, security groups, and related components on Azure is preferable.

Data Processing with Big Data Frameworks:

• Spark: Mastery of PySpark for data processing, particularly DataFrame API.

• Delta Lake: Understanding of Delta Lake for reliable data lakes.

Data Storage & Data Management:

• Database Management: Knowledge of both relational (SQL Server) and NoSQL databases (like Cosmos DB).

• Data Formats: Familiarity with different data storage formats such as Parquet, JSON, CSV, and Delta.

Data Integration Tools:

• Azure Data Factory: Skill in designing, deploying, and managing data integration solutions with ADF.

Data Modeling:

• Schema Design: Ability to design efficient and scalable database schemas for both operational and analytical use cases. Should know Dimensional Modelling, 3NF (Third Normal Form) Data Modeling and Data Vault Modeling.

• ETL Design: Knowledge in designing Extract, Transform, Load (ETL) processes.

• DBT: Used to model, transform and test data on specific data products.

Performance Optimization:

• Query Tuning: Skills in optimizing complex SQL queries.

• Pipeline Optimization: Knowledge in optimizing data processing pipelines, particularly in Databricks/Spark.

Security & Compliance:

• Data Security: Knowledge of encryption techniques, both at-rest and in-transit.

• Access Control: Understanding of role-based access controls and integration with Azure Active Directory.

DevOps & Automation:

• CI/CD: Experience with continuous integration and continuous deployment tools like Azure DevOps.

• Infrastructure as Code: Familiarity with tools like Azure Resource Manager (ARM) templates or Terraform is a plus

• Containerization: Basic understanding of Docker and Kubernetes, especially as they might integrate with Azure services is a plus.

Monitoring & Troubleshooting:

• Logging & Monitoring: Familiarity with tools like Azure Monitor, Log Analytics, or other monitoring solutions.

• Debugging: Ability to troubleshoot and debug issues within data pipelines and storage systems.

Education

Engineering Master’s degree or PhD