Description

Technical Skills:

  • Understanding the client’s requirement and drafting High level design document for development purposes
  • Importing raw or structured data from databases like Teradata ,Oracle systems to Hadoop using Sqoop or through AWS S3 or through direct SFTP
  • Ingesting the data to DataLake through tools like ICT or by using automated scripting languages like JAVA, SCALA, PYTHON etc.
  • Converting raw data into structured format using APACHE SPARK (pyspark) and feeding it to hive tables.
  • Using Spark-data frames in the applications where iteration of data was necessary
  • Improving the performance of an application using the methods like partitioning and bucketing in Hive or Apache Spark
  • Working with parquet and ORC , JSON , AVRO , delimited formatted files to process the data ingestion
  • Using spark HQL (Hive Query Language) for querying or for data ingestion in to the DataLake
  • Enhancing existing JAVA software code using HADOOP Spark and Hive
  • Performing application dependency checks by developing frameworks that were based on Scala
  • Designing and analyzing the Quality Data Management for the data ingestion and publishing processes
  • Using encryption or decryption methods for enhancing data security
  • Performing unit testing for the applications that were developed for automation
  • Coordinating and assisting the testing, operations and sustainment teams during deployment process
  • We are getting data via different ways one of them is data router and another one is the s3 bucket. Once we receive files to the cluster we will ingest that data to hive tables and then send it downstream We will improve the performance of the application and be modifying the dates according to the downstream

Experience:

2-3 Years in related field

Education:

Bachelor’s in Computer Science or related field


 

Education

Any Graduate