Description:
• 9+ years of overall IT experience, which includes hands-on experience in Big Data technologies.
• Mandatory - Hands-on experience in Python and PySpark.
• Build pySpark applications using Spark Dataframes in Python using Jupyter Notebook and PyCharm(IDE).
• Worked on optimizing spark jobs that process huge volumes of data.
• Hands-on experience in version control tools like Git.
• Worked on Amazon’s Analytics services like Amazon EMR, Amazon Athena, and AWS Glue.
• Worked on Amazon’s Compute services like Amazon Lambda, Amazon EC2 and Amazon’s Storage service like S3 and few other services like SNS.
• Experience/knowledge of bash/shell scripting will be a plus.
• Has built ETL processes to take data, copy it, structurally transform it etc. involving a wide variety of formats like CSV, TSV, XML and JSON.
• Experience in working with fixed width, delimited , multi record file formats etc.
• Good to have knowledge of data warehousing concepts – dimensions, facts, schemas- snowflake, star etc.
• Have worked with columnar storage formats- Parquet, Avro, ORC etc. Well versed with compression techniques – Snappy, Gzip.
• Good to have knowledge of AWS databases (at least one) Aurora, RDS, Redshift, ElastiCache, DynamoDB.
• Hands on experience in tools like Jenkins to build, test and deploy the applications
• Awareness of DevOps concepts and be able to work in an automated release pipeline environment.
• Excellent debugging skills.