Description

Responsibilities:

  • Design, Develop and Implement Big data engineering projects in Hadoop ecosystem.
  • Engineer solutions with Cloudera, MapR or HDP for both batch & streaming data with high quality and with a sense of urgency.
  • Develop application and custom integration solutions using spark streaming and Hive.
  • Understand specifications, plan, design and develop software solutions, adhering to process – either individually or collectively within a project team
  • Work in state-of-the art programming languages and utilize object-oriented approaches in designing, coding, testing and debugging programs.
  • Work with support teams in resolving operational & performance issues
  • Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
  • Integrate data from multiple data sources, Implementing ETL process using APACHE NIFI
  • Monitoring performance and advising any necessary infrastructure changes
  • Management of Hadoop cluster, with all included services such as Hive, HBase, mapReduce and Sqoop
  • Cleaning data as per business requirements using streaming API’s or user defined functions.
  • Build distributed, reliable and scalable data pipelines to ingest and process data in real-time, defining Hadoop Job Flows.
  • Managing Hadoop jobs using scheduler.
  • Apply different HDFS formats and structure like Parquet, Avro, etc. to speed up analytics.
  • Work with various hadoop ecosystem tools like Hive, pig, Hbase , spark etc.
  • Reviewing and managing Hadoop log files.
  • Assess the quality of datasets for a hadoop data lake.
  • Fine tune Hadoop applications for high performance and throughput.
  • Troubleshoot and debug any Hadoop ecosystem run time issues

Education

Bachelor's Degree