Role: Hadoop PySpark Technical Lead
Understand requirements/use cases and build efficient ETL solutions using Apache Spark, python, Kafka, Hive targeting Cloudera Data Platform.
Requirement/use case analysis and convert functional requirements into concrete technical tasks and able to provide reasonable effort estimates.
Work closely with Data analyst/modeler, Business User to understand the data requirement.
Convert requirements to high-level, low-level design and, source-to- target documents.
Responsible to design, develop and schedule data pipelines which handle large volume of data within SLA.
Work with solution architect, Technical Managers, Admins to understand SLAs, limitations of systems and provide efficient solutions.
Expertise in processing large volume of data aggregation using spark, must know different performance improvement technique and should lead teams on optimization.
Responsible to develop efficient data ingestion and data governance framework as per specification.
Performance improvement of existing spark-based data ingestion, aggregation pipelines to meet SLA.
Work proactively, independently with global teams to address project requirements, articulate issues/challenges with enough lead time to address project delivery risks.
Plan production implementation activities , execute change requests and resolve issues in production implementation.
Plan and execute large data migration, history data rebuild activities.
Code reviews/optimization, test case reviews . Demonstrate troubleshooting skill in resolving technical issues, bugs.
Demonstrate ownership and initiative. Ability to bring-in best practices /solutions which best fit for client problem and environment.
Any Graduate