Job Description:
Technical/Functional Skills:
- Proficiency in working with the Databricks Unified Analytics Platform, including notebooks, clusters, jobs, and libraries.
- Strong programming skills in languages commonly used with Databricks, such as Python, Scala, and SQL.
- Experience in designing and implementing ETL processes within Databricks using Spark SQL, DataFrame API, and structured streaming.
- Understanding and mastery of Spark DataFrames and Datasets for efficient and structured data manipulation.
- Knowledge of configuring and optimizing Databricks runtime settings for performance and resource utilization.
- Proficiency in creating and managing clusters in Databricks, optimizing configurations based on workload requirements.
- Integration skills to connect Databricks with various data sources and sinks, including cloud storage, databases, and streaming platforms.
- Knowledge of Databricks security features, including access controls, encryption, and integration with identity providers.
- Experience with structured streaming in Databricks for real-time data processing and analytics.
Roles & Responsibilities:
- Should be able to use version control systems, such as Git, to manage codebase changes and collaborate with a development team.
- Should have Collaboration skills using Databricks notebooks, including sharing, versioning, and commenting on code.
- monitor Databricks workloads, interpret logs, and optimize performance using built-in and external monitoring tools.
- Create Scripts to automate tasks within Databricks, leveraging APIs and Databricks CLI.
- Implement of cluster autoscaling in Databricks to optimize resource utilization.