Senior Data Engineer

Avance Consulting
London,United Kingdom

Description

Job Description:

1. Purpose of the Job – A simple statement to clearly identify the job's objective.

• The Tech Lead / Senior Data Engineer will lead and contribute to the design, development, maintenance, and evolution of AI and Data Products.

• The position is highly technical and aligns with delivering business value as you will be working within an AI and Data Product team.

• From day #1, he will participate in the design, development, and evolution of data pipeline, aiming at serving data to data consumers part of the product team.

• Tech Lead / Senior Data Engineer reports to the Engineering Manager who reports to the Global VP Engineering

2. Key Responsibilities and Expected Deliverables– This details what actually needs to be done; the duties and expected outcomes.

Tech Lead

• Privileged technical point of contact for the Product Owner and business sponsors

• Ability to take data engineering responsibility of a data project

• Listening skills and very good communication skills

• Ability to manage technical debt (variable according to use cases)

• Technical architect, lead design meetings and tech breakdowns

• Ability to mentor, guide the work of a more junior Data Engineer

Data Ingestion & Processing:

• Design and implement batch and real-time data ingestion pipelines.

• Use Databricks (with PySpark) for big data processing tasks.

• Use DBT to transform data.

• Cleanse, transform, and enrich raw data to make it analytics ready.

• Optimize queries and data processing for speed and cost efficiency.

Data Storage & Management:

• Design and implement database schemas, tables, and views.

• Optimize storage formats for querying, such as Parquet or Delta Lake.

• Enforce data quality checks and data lineage documentation.

• Implement partitioning, bucketing, and indexing strategies for efficient data retrieval.

Collaboration with data experts (data analyst, data scientists):

• Work closely with data scientists to provide data in appropriate formats for machine learning and advanced analytics.

• Collaborate with Platform Engineer teams to comply with Platform good practices and escalate common needs

• Assist data analysts with SQL queries, views, and report generation.

• Collaborate on the deployment of machine learning models to production.

Security & Compliance (with Azure security experts):

• Implement role-based access controls and data encryption (at-rest and in-transit).

• Comply with industry and organizational data standards, privacy regulations, and best practices.

• Regularly audit data access and usage.

Infrastructure & Configuration (with Azure infrastructure experts):

• Set up and maintain Azure cloud infrastructure.

• Configure and optimize Azure Data Lake Storage, Blob Storage, and other Azure storage solutions.

• Deploy and manage Databricks clusters for processing tasks.

• Implement and maintain data pipelines using Azure Data Factory.

• Monitor and troubleshoot infrastructure-related issues.

Documentation & Training:

• Onboard new team members, providing access and initial training on tools.

• Create documentation and knowledge bases for data pipelines, best practices, and tooling.

Continuous Improvement:

• Stay updated with the latest advancements in data engineering technologies.

• Propose and implement optimizations for current workflows and systems.

• Proactively identify areas of improvement and automation.

• Regularly update team on new features or changes in Azure, Databricks, or related technologies.

EDUCATION

• Engineering Master’s degree or PhD

• 5 years experience in a Data engineering role into large corporate organizations

• Experience in a Data/AI environment into a Cloud ecosystem

HARD SKILLS

Software Engineering & SQL:

• Python: Strong coding skills in Python, especially with libraries related to data manipulation (e.g., Pandas) and interfacing with databases.

• Python ecosystem: Strong knowledge of tooling in Python ecosystem such as dependency management tooling (poetry, venv)

• Software engineering practices: Strong experience in putting in place good software engineering practices such as design patterns, testing (unit, integration, e2e), clean code, code formatting etc.

• SQL: Advanced knowledge of SQL for data querying, transformation, and aggregation.

Data Architecture:

• Design: Ability to design scalable and robust data pipelines considering functional and non-functional requirements.

• Integration: Knowledge of data architectures to ensure reliable and efficient data flow.

Cloud Platforms:

• Azure Services: Proficiency in Azure Data Lake, Azure Data Factory, Azure Blob Storage, Azure SQL Database, and other related Azure services.

• Azure Databricks: Proficiency in using Databricks for deploying & running Spark jobs including advanced usage (such as handling clusters, secret scopes, warehouses, unity catalog etc.)

• Cloud Infrastructure: Familiarity with setting up, configuring, and managing virtual networks, VMs, security groups, and related components on Azure is preferable.

Data Processing with Big Data Frameworks:

• Spark: Mastery of PySpark for data processing, particularly DataFrame API.

• Delta Lake: Understanding of Delta Lake for reliable data lakes.

Data Storage & Data Management:

• Database Management: Knowledge of both relational (SQL Server) and NoSQL databases (like Cosmos DB).

• Data Formats: Familiarity with different data storage formats such as Parquet, JSON, CSV, and Delta.

Data Integration Tools:

• Azure Data Factory: Skill in designing, deploying, and managing data integration solutions with ADF.

Data Modeling:

• Schema Design: Ability to design efficient and scalable database schemas for both operational and analytical use cases. Should know Dimensional Modelling, 3NF (Third Normal Form) Data Modeling and Data Vault Modeling.

• ETL Design: Knowledge in designing Extract, Transform, Load (ETL) processes.

• DBT: Used to model, transform and test data on specific data products.

Performance Optimization:

• Query Tuning: Skills in optimizing complex SQL queries.

• Pipeline Optimization: Knowledge in optimizing data processing pipelines, particularly in Databricks/Spark.

Security & Compliance:

• Data Security: Knowledge of encryption techniques, both at-rest and in-transit.

• Access Control: Understanding of role-based access controls and integration with Azure Active Directory.

DevOps & Automation:

• CI/CD: Experience with continuous integration and continuous deployment tools like Azure DevOps.

• Infrastructure as Code: Familiarity with tools like Azure Resource Manager (ARM) templates or Terraform is a plus

• Containerization: Basic understanding of Docker and Kubernetes, especially as they might integrate with Azure services is a plus.

Monitoring & Troubleshooting:

• Logging & Monitoring: Familiarity with tools like Azure Monitor, Log Analytics, or other monitoring solutions.

• Debugging: Ability to troubleshoot and debug issues within data pipelines and storage systems.

Key Skills

Azure cloud Azure Data Factory Azure data lake Senior Data Engineer Blob Storage Data bricks DataBricks

Education

Engineering Master’s degree or PhD

Back To Jobs

Posted On: 17-Nov-2024
Experience: 5+ years of experience
Openings: 2
Category: Senior Data Engineer
Tenure: Contract - Corp-to-Corp Position