Experience with Big Data Technologies
- Spark (Java or PySpark) for handling large-scale data processing.
- Proficiency in SQL and Database-querying, managing, and manipulating data sets.
- Knowledge of Cloud Platforms, data storage, processing, and deployment in a scalable environment (Azure)
- Design and implement scalable data processing pipelines using Apache Spark.
- Develop and optimize Spark jobs for data transformation, aggregation, and analysis.
- Work with large datasets to extract, process, and analyse data from various sources.
- Collaborate with data scientists, analysts, and other engineers to understand data requirements and deliver solutions.
- Implement data integration solutions to connect disparate data sources.
- Ensure data quality, integrity, and consistency throughout the data processing pipeline.
- Monitor and troubleshoot performance issues in Spark jobs and cluster environments.
- Stay current with the latest developments in big data technologies and best practices.
- Document technical designs, processes, and procedures.