PRIMARY OBJECTIVE
• Subject matter expert on Data Lake / Hadoop platform management and Data Ingestion for Analytics use case implementation
• Utilizes big and small data to generate features, derive impactful insight, visualise information, and support impactful decision-making
• Design, maintain, and improve data pipeline for customization of integration & visualization tools, databases, data warehouse, and analytical systems.
• Data Management for structured and unstructured data for Analytics Department • We are scaling the Capabilities & Data Build Squad team that works on providing internal teams with secure, reliable, performative, and user-friendly access to data to support analytics & other downstream critical processes. • This role requires technical expertise and willingness to learn a wide variety of technologies to develop real-time data pipelines, data product APIs, and modern & scalable data infrastructure • Big Data Engineer will be a key part of our team and will help bring structure to vast amounts of data, making it digestible and build scalable data platforms that enable data products, business analytics, and data science.
KEY RESPONSIBILITIES
• Gather and process raw data at scale.
• Design and develop data applications using selected tools and frameworks as required and requested.
• Read, extract, transform, stage and load data to selected tools and frameworks as required and requested.
• Simplify access to real-time data for internal stakeholders using real time
• Design and implement our real-time data pipelines
• Collaborate with data stakeholders and stewards on the verification and the accuracy of the information collected.
• Provide technical lead to Data Owners and Stewards on data definition, data lineage changes by supporting intake process, performing impact analysis, and conducting domain specific profiling.
• Monitoring data performance and modifying infrastructure as needed.
• Custom integration with various banking systems, data warehouses, and analytics systems.
• Setup of data-access or visualization tools for data scientists such as data science workbench
• Develop and implement analytics use case to yield business value from data and insight
• Extracts and transforms structured and unstructured big and small data to generate features, derive impactful insight, visualise information, and support impactful decision-making
• Design of data pipeline and implement data ingestion from end-to-end to Big Data repositories.
• Demonstrates efficiency through contributing to and applying bank-wide best practices in analytics and data
• Human-centred design (together with digital team)
• Data Management & Data Governance for Big Data Platform
• Stay abreast of emerging technologies and projects in the modern Data Engineering/Data Lake/Big Data space
REQUIREMENTS
• Degree/Master in IT, Computer Science or related discipline
• 5+ years of relevant industry experience.
• Excellent knowledge of Python, Scala or Java and SQL.
• Demonstrated ability to build high-volume data ingestion and streaming pipelines (e.g. Kafka, AWS Kinesis, Spark Streaming)
• Working experience with big data technologies (e.g. Spark, Kafka, Hive).
• Experience designing, building, and scaling a production-ready event streaming system.
• Experience in optimizing SQL queries (e.g. data partitioning, bucketing, indexing)
• Experience in building data lake/warehouse solutions consisting of structured and unstructured data.
• Big Data Technology and Programming skill and knowledge such as
• Python, Hive, Hue, Spark, Pyspark, Yarn, HBase, Cloudera, Nifi, Mapreduce, Kafka, Linux)
• Knowledge in advanced data technologies like Kafka, H2O, ElasticSearch etc would be an added advantage
Bachelor's degree in Computer Science