Overview:
The Databricks + Spark Developer plays a crucial role in harnessing the power of data using Databricks and Spark to develop and maintain efficient data pipelines. They are responsible for implementing scalable and reliable solutions that enable data-driven decision-making within the organization.
Key Responsibilities:
- Designing and implementing robust ETL processes using Databricks and Spark
- Developing and optimizing data pipelines for large-scale data processing
- Collaborating with data engineers and data scientists to support their data infrastructure needs
- Building and maintaining data warehouse solutions to support business analytics
- Performing data modeling and optimization to ensure efficient data storage and retrieval
- Troubleshooting and resolving performance issues with data infrastructure and pipelines
- Implementing security and data governance best practices within the data platform
- Automating data quality checks and ensuring data consistency and accuracy
- Collaborating with cross-functional teams to understand data requirements and deliver solutions
- Monitoring and maintaining the health of data pipelines and infrastructure
- Documenting technical design and architecture of data solutions
- Participating in code reviews and providing constructive feedback to peers
- Staying updated with the latest advancements in Databricks and Spark technologies
- Providing technical guidance and mentorship to junior team members
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field
- Proven experience in developing data pipelines using Databricks and Spark
- Proficiency in ETL processes and data warehousing concepts
- Strong SQL skills with the ability to write complex queries for data manipulation and analysis
- Advanced programming skills in Python for data processing and manipulation
- Experience in data modeling and optimizing data storage for performance
- Deep understanding of big data technologies and distributed computing concepts
- Ability to troubleshoot and optimize data pipeline performance for efficiency and reliability
- Knowledge of data governance, security, and compliance best practices
- Excellent communication and collaboration skills to work effectively in a team environment
- Strong analytical and problem-solving abilities to tackle complex data engineering challenges
- Ability to multitask and prioritize tasks in a fast-paced and dynamic work environment
- Experience with cloud platforms such as AWS, Azure, or GCP is a plus
- Certifications in Databricks and Spark-related technologies are desirable
- Experience in Agile development methodologies and version control systems
Experience Required:
Data Engineering resource with Databricks (Databricks develops a web-based platform for working with Spark) experience with Python coding background & SQL