Job Duties:
Design, develop, configure, and implement high performance data processing applications using Big Data Technology stack - Apache Hive, Sqoop, HDFS, MapReduce, Oozie, Apache Spark, Scala, Kafka, and Kerberos. Develop Hadoop MapReduce jobs in Java to process web log data on HDFS. Develop batch processing applications on Hadoop platform using - Apache Hive and Apache Spark in Scala. Develop ETL jobs using Data Integration tools like Informatica to build Data Warehouse applications on Oracle. Perform performance tune of Data Warehouse SQL queries by using data base indexes and tuning joins. Develop Apache Sqoop scripts to pull data from Relational Database Management Systems and inject into Hortonworks Hadoop cluster. Develop UNIX Shell Scripts to automate file transmission jobs in distributed environments. Develop jobs in Apache Spark using high level Application Programming Interface like Data Frames/Datasets and Spark SQL to process and aggregate data in Hadoop cluster. Optimize Hive table design by applying partitioning and bucketing techniques to reduce Hive queries execution time to sub second. Develop Business Intelligence SQL queries and test them against Low Level Analytical Processing engines -Druid, Apache Phoenix, Kinetica, Brytlyt, Hive LLAP, Jethro, and Presto. Develop job orchestration scripts in Oozie to automate Hive, Sqoop, and Spark jobs on Hortonworks Hadoop Cluster. Perform System Integration testing and resolve defects arising out of the testing. Work on production install activities, and perform code and data validation. Develop back-out plans in case of data or code issues during production install. Provide post production install support. Will work in Glastonbury, CT and/or various client sites throughout the U.S. Must be willing to travel and /or relocate.
Any graduate