Data Engineer (Azure Databricks):
Responsibilities:
- Design, develop, and maintain data pipelines using Azure Databricks for data ingestion, transformation, and loading into various data stores (e.g., Azure Data Lake Storage, Azure SQL Database, Azure Synapse Analytics).
- Implement data processing solutions using Apache Spark on Azure Databricks, leveraging its capabilities for distributed computing, machine learning, and real-time data processing.
- Develop and optimize data models and schemas for efficient data storage and retrieval.
- Collaborate with data scientists, data analysts, and other stakeholders to understand their data needs and translate them into technical solutions.
- Work with the team to define and implement data governance policies and best practices for data quality and security.
- Monitor and troubleshoot data pipelines and applications on Azure Databricks, ensuring optimal performance and reliability.
- Stay up-to-date with the latest advancements in Azure Databricks and related technologies.
- Contribute to building a strong data culture within the organization.
Required Skills:
- 10+ years of experience
- Strong proficiency in Python or Scala for data processing and manipulation in Azure Databricks.
- Experience working with Apache Spark, including its core concepts, data structures, and APIs.
- Familiarity with data ingestion and transformation techniques (e.g., ETL, ELT).
- Experience with data warehousing and data lake concepts.
- Hands-on experience with Azure Databricks, including workspace management, cluster configuration, and job scheduling.
- Knowledge of Azure storage services like Azure Data Lake Storage, Azure Blob Storage, and Azure SQL Database.
- Understanding of data security principles and best practices.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.