Design, Develop and Implement Big data engineering projects in Hadoop ecosystem.
Engineer solutions with Cloudera, MapR or HDP for both batch & streaming data with high quality and with a sense of urgency.
Develop application and custom integration solutions using spark streaming and Hive.
Understand specifications, plan, design and develop software solutions, adhering to process – either individually or collectively within a project team
Work in state-of-the art programming languages and utilize object-oriented approaches in designing, coding, testing and debugging programs.
Work with support teams in resolving operational & performance issues
Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
Integrate data from multiple data sources, Implementing ETL process using APACHE NIFI
Monitoring performance and advising any necessary infrastructure changes
Management of Hadoop cluster, with all included services such as Hive, HBase, mapReduce and Sqoop
Cleaning data as per business requirements using streaming API’s or user defined functions.
Build distributed, reliable and scalable data pipelines to ingest and process data in real-time, defining Hadoop Job Flows.
Managing Hadoop jobs using scheduler.
Apply different HDFS formats and structure like Parquet, Avro, etc. to speed up analytics.
Work with various hadoop ecosystem tools like Hive, pig, Hbase , spark etc.
Reviewing and managing Hadoop log files.
Assess the quality of datasets for a hadoop data lake.
Fine tune Hadoop applications for high performance and throughput.
Troubleshoot and debug any Hadoop ecosystem run time issues