Description

Job description
• Designing and implementing highly performant data ingestion pipelines from multiple sources using Apache Spark and/or Azure Databricks
• Delivering and presenting proofs of concept to of key technology components to project stakeholders.
• Developing scalable and re-usable frameworks for ingesting of geospatial data sets
• Integrating the end to end data pipelines to take data from source systems to target data repositories ensuring the quality and consistency of data is maintained at all times
• Working with event based / streaming technologies to ingest and process data
• Working with other members of the project team to support delivery of additional project components (API interfaces, Search)
• Evaluating the performance and applicability of multiple tools against customer requirements
• Working within an Agile delivery / DevOps methodology to deliver proof of concept and production implementation in iterative sprints.
Qualifications
• Strong knowledge of Data Management principles
• Experience in building ETL / data warehouse transformation processes
• Direct experience of building data pipelines using Azure Data Factory and Apache Spark (preferably Databricks).
• Experience using geospatial frameworks on Apache Spark and associated design and development patterns
• Microsoft Azure Big Data Architecture certification.
• Hands on experience designing and delivering solutions using the Azure Data Analytics platform (Cortana Intelligence Platform) including Azure Storage, Azure SQL Data Warehouse, Azure Data Lake, Azure Cosmos DB, Azure Stream Analytics
• Experience with Open Source non-relational / NoSQL data repositories (incl. SnowFlake, MongoDB, Cassandra, Neo4J)
• Experience working with structured and unstructured data including imaging & geospatial data.
• Experience working in a Dev/Ops environment with tools such as Jenkins

Education

Bachelor's degree in Computer Science