Job Description:
1. Purpose of the Job – A simple statement to clearly identify the job's objective.
• The Tech Lead / Senior Data Engineer will lead and contribute to the design, development, maintenance, and evolution of AI and Data Products.
• The position is highly technical and aligns with delivering business value as you will be working within an AI and Data Product team.
• From day #1, he will participate in the design, development, and evolution of data pipeline, aiming at serving data to data consumers part of the product team.
• Tech Lead / Senior Data Engineer reports to the Engineering Manager who reports to the Global VP Engineering
2. Key Responsibilities and Expected Deliverables– This details what actually needs to be done; the duties and expected outcomes.
Tech Lead
• Privileged technical point of contact for the Product Owner and business sponsors
• Ability to take data engineering responsibility of a data project
• Listening skills and very good communication skills
• Ability to manage technical debt (variable according to use cases)
• Technical architect, lead design meetings and tech breakdowns
• Ability to mentor, guide the work of a more junior Data Engineer
Data Ingestion & Processing:
• Design and implement batch and real-time data ingestion pipelines.
• Use Databricks (with PySpark) for big data processing tasks.
• Use DBT to transform data.
• Cleanse, transform, and enrich raw data to make it analytics ready.
• Optimize queries and data processing for speed and cost efficiency.
Data Storage & Management:
• Design and implement database schemas, tables, and views.
• Optimize storage formats for querying, such as Parquet or Delta Lake.
• Enforce data quality checks and data lineage documentation.
• Implement partitioning, bucketing, and indexing strategies for efficient data retrieval.
Collaboration with data experts (data analyst, data scientists):
• Work closely with data scientists to provide data in appropriate formats for machine learning and advanced analytics.
• Collaborate with Platform Engineer teams to comply with Platform good practices and escalate common needs
• Assist data analysts with SQL queries, views, and report generation.
• Collaborate on the deployment of machine learning models to production.
Security & Compliance (with Azure security experts):
• Implement role-based access controls and data encryption (at-rest and in-transit).
• Comply with industry and organizational data standards, privacy regulations, and best practices.
• Regularly audit data access and usage.
Infrastructure & Configuration (with Azure infrastructure experts):
• Set up and maintain Azure cloud infrastructure.
• Configure and optimize Azure Data Lake Storage, Blob Storage, and other Azure storage solutions.
• Deploy and manage Databricks clusters for processing tasks.
• Implement and maintain data pipelines using Azure Data Factory.
• Monitor and troubleshoot infrastructure-related issues.
Documentation & Training:
• Onboard new team members, providing access and initial training on tools.
• Create documentation and knowledge bases for data pipelines, best practices, and tooling.
Continuous Improvement:
• Stay updated with the latest advancements in data engineering technologies.
• Propose and implement optimizations for current workflows and systems.
• Proactively identify areas of improvement and automation.
• Regularly update team on new features or changes in Azure, Databricks, or related technologies.
EDUCATION
• Engineering Master’s degree or PhD
• 5 years experience in a Data engineering role into large corporate organizations
• Experience in a Data/AI environment into a Cloud ecosystem
HARD SKILLS
Software Engineering & SQL:
• Python: Strong coding skills in Python, especially with libraries related to data manipulation (e.g., Pandas) and interfacing with databases.
• Python ecosystem: Strong knowledge of tooling in Python ecosystem such as dependency management tooling (poetry, venv)
• Software engineering practices: Strong experience in putting in place good software engineering practices such as design patterns, testing (unit, integration, e2e), clean code, code formatting etc.
• SQL: Advanced knowledge of SQL for data querying, transformation, and aggregation.
Data Architecture:
• Design: Ability to design scalable and robust data pipelines considering functional and non-functional requirements.
• Integration: Knowledge of data architectures to ensure reliable and efficient data flow.
Cloud Platforms:
• Azure Services: Proficiency in Azure Data Lake, Azure Data Factory, Azure Blob Storage, Azure SQL Database, and other related Azure services.
• Azure Databricks: Proficiency in using Databricks for deploying & running Spark jobs including advanced usage (such as handling clusters, secret scopes, warehouses, unity catalog etc.)
• Cloud Infrastructure: Familiarity with setting up, configuring, and managing virtual networks, VMs, security groups, and related components on Azure is preferable.
Data Processing with Big Data Frameworks:
• Spark: Mastery of PySpark for data processing, particularly DataFrame API.
• Delta Lake: Understanding of Delta Lake for reliable data lakes.
Data Storage & Data Management:
• Database Management: Knowledge of both relational (SQL Server) and NoSQL databases (like Cosmos DB).
• Data Formats: Familiarity with different data storage formats such as Parquet, JSON, CSV, and Delta.
Data Integration Tools:
• Azure Data Factory: Skill in designing, deploying, and managing data integration solutions with ADF.
Data Modeling:
• Schema Design: Ability to design efficient and scalable database schemas for both operational and analytical use cases. Should know Dimensional Modelling, 3NF (Third Normal Form) Data Modeling and Data Vault Modeling.
• ETL Design: Knowledge in designing Extract, Transform, Load (ETL) processes.
• DBT: Used to model, transform and test data on specific data products.
Performance Optimization:
• Query Tuning: Skills in optimizing complex SQL queries.
• Pipeline Optimization: Knowledge in optimizing data processing pipelines, particularly in Databricks/Spark.
Security & Compliance:
• Data Security: Knowledge of encryption techniques, both at-rest and in-transit.
• Access Control: Understanding of role-based access controls and integration with Azure Active Directory.
DevOps & Automation:
• CI/CD: Experience with continuous integration and continuous deployment tools like Azure DevOps.
• Infrastructure as Code: Familiarity with tools like Azure Resource Manager (ARM) templates or Terraform is a plus
• Containerization: Basic understanding of Docker and Kubernetes, especially as they might integrate with Azure services is a plus.
Monitoring & Troubleshooting:
• Logging & Monitoring: Familiarity with tools like Azure Monitor, Log Analytics, or other monitoring solutions.
• Debugging: Ability to troubleshoot and debug issues within data pipelines and storage systems.
Engineering Master’s degree or PhD