Description

Gather requirements and participate in the Agile planning meetings to finalize the scope. Actively participate in Story Time, Sprint Planning and Sprint Retrospective meetings. Develop Hive queries using joins and partitions for huge data sets as per business requirements and load the filtered data from source to edge node hive tables and validate the data. Develop shell scripts to schedule full and incremental load, check data quality. Implement optimization techniques in hive. Assist team in migrating data from GCP (Google Cloud Storage) Hive to BigQuery. Use HiveQL for data analysis Troubleshoot performance issues. Involve in performance tuning of Spark Applications for setting right level of Parallelism and memory tuning. Help in building GCP native tables, data validation between Teradata and GCP Hive and solve production issues. Troubleshoot connectivity issues. Automate running metadata sync between GCS and GCP Hive. Import and export data into HDFS and Hive. Develop Spark Jobs using Scala and Python (Pyspark) APIs. Use Spark SQL to create structured data. Flatten On-prem data and Ingest into Druid. Involve in Job management and Developed job processing scripts using Automatic scheduler. Support and monitor production critical jobs.

This Position requires a Master’s degree in Computer Science, Computer Engineering, Data Analytics, or an MBA.

Key Skills
Education

Any Graduate