Description

Job Description

Key Responsibilities:

  • Architect and Design Solutions:
    Lead the architecture and design of Databricks-based data solutions that support data engineering, machine learning, and real-time analytics.
  • Data Pipeline Design:
    Design and implement ETL (Extract, Transform, Load) pipelines using Databricks, Apache Spark, and other big data tools to process and integrate large-scale data from multiple sources.
  • Collaborate with Stakeholders:
    Work with business and data teams to understand requirements, identify opportunities for automation, and design solutions that improve data workflows.
  • Optimize Data Architecture:
    Create highly optimized, scalable, and cost-effective architectures for processing large data sets and managing big data workloads using Databricks, Delta Lake, and Apache Spark.
  • Implement Best Practices:
    Define and promote best practices for Databricks implementation, including data governance, security, performance optimization, and monitoring.
  • Manage Databricks Clusters:
    Manage and optimize Databricks clusters for performance, cost, and reliability. Troubleshoot performance issues and optimize the use of cloud resources.
  • Data Governance and Security:
    Implement best practices for data governance, security, and compliance on the Databricks platform to ensure that data processing and storage meet organizational and regulatory standards.
  • Automation and Optimization:
    Automate repetitive tasks, streamline data processes, and optimize data workflows to improve efficiency and reduce operational costs.
  • Mentorship and Training:
    Mentor and provide guidance to junior engineers, ensuring the team follows best practices in the development of data pipelines and analytics solutions.
  • Keep Up-to-Date with Trends:
    Stay current with emerging technologies in the big data and cloud space, and recommend new solutions or improvements to existing processes.

Required Skills & Qualifications:

  • Technical Expertise:
    • Extensive experience with Databricks, Apache Spark, and cloud platforms (AWS, Azure, or GCP).
    • Proficiency in programming languages such as Python, Scala, or SQL.
    • Strong understanding of distributed computing, data modeling, and data storage technologies.
    • Hands-on experience with Delta Lake, Spark SQL, and MLlib.
  • Experience with Cloud Services:
    • Expertise in deploying and managing data platforms and workloads on cloud environments like AWS, Azure, or GCP.
    • Familiarity with cloud-native services like S3, Redshift, Azure Blob Storage, and BigQuery.
  • Data Engineering Skills:
    • Experience designing, building, and optimizing ETL data pipelines.
    • Familiarity with data warehousing concepts, OLAP, and OLTP systems.
  • Machine Learning (ML) Knowledge:
    • Experience in integrating machine learning workflows with Databricks, building models, and automating model deployment.
  • Leadership and Collaboration:
    • Strong leadership and communication skills to interact with both technical and non-technical stakeholders.
    • Experience in leading cross-functional teams and mentoring junior team members.

Preferred Skills:

  • Advanced Databricks Knowledge:
    In-depth experience with Databricks components, such as notebooks, jobs, and collaboration features.
  • DevOps & CI/CD:
    Experience with DevOps practices, automation, and CI/CD pipelines in data engineering.
  • Data Governance:
    Strong knowledge of data governance principles, such as metadata management, data lineage, and data quality.
  • Certifications:
    • Databricks Certified Associate Developer for Apache Spark.
    • Cloud certifications (e.g., AWS Certified Solutions Architect, Azure Solutions Architect Expert).

Education & Experience:

  • Education:
    Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related field (or equivalent work experience).
  • Experience:
    5+ years of experience in data architecture, engineering, and working with cloud platforms (preferably with Databricks and Apache Spark)

Education

Bachelor's Degree