Job Description:
Design end-to-end data processing pipelines using Databricks for batch and streaming data ingestion, transformation, and analysis.
Architect and implement scalable data lake solutions on Databricks to store and manage structured and unstructured data.
Develop and optimize Spark jobs and notebooks for data manipulation, feature engineering, and model training.
Collaborate with data scientists and analysts to deploy machine learning models on Databricks for predictive analytics and decision support.
Implement data governance and security controls to ensure compliance with regulatory requirements and protect sensitive information.
Optimize Databricks clusters for performance, reliability, and cost-efficiency.
Monitor Databricks workloads and troubleshoot performance issues and bottlenecks.
Stay current with Databricks best practices, new features, and emerging technologies.
Requirements:
Bachelor's degree in Computer Science, Information Technology, or related field.
Proven experience as a Databricks Architect or similar role in designing and implementing data analytics solutions.
Strong proficiency in Apache Spark and Databricks platform.
Hands-on experience with Databricks notebooks, Spark SQL, DataFrame API, and MLflow.
Experience with cloud-based data platforms (e.g., AWS, Azure, Google Cloud Platform).
Knowledge of data lake architecture principles and best practices.
Understanding of data governance, security, and compliance requirements.
Excellent problem-solving and communication skills.
Ability to work independently and collaboratively in a fast-paced environment.
Preferred Qualifications:
Databricks certification (e.g., Databricks Certified Developer, Databricks Certified Professional Data Scientist).
Experience with containerization technologies (e.g., Docker, Kubernetes).
Familiarity with DevOps practices and tools for CI/CD automatio
Bachelor's degree in Computer Science