· Development experience with Java, Scala, Flume, Python
· Development background with Spark or MapReduce/YARN is a must have
· This candidate is expected to have significant application development experience in the past either with Java or Python. We are looking for someone who was a Software Engineer in the past and turned into a Data Engineer and build end to end data pipelines from ground up
· Knowledge/experience on Teradata Physical Design and Implementation, Teradata SQL Performance Optimization
· Advanced SQL (preferably Teradata)
· Experience working with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hive, Pig, Apache Spark, etc.).
· Strong Hadoop scripting skills to process petabytes of data
· Experience in Unix/Linux shell scripting or similar programming/scripting knowledge
· Experience in ETL/ processes
· Real time data ingestion (Kafka)
Bachelor’s Degree