Data Engineer

Description:

• 9+ years of overall IT experience, which includes hands-on experience in Big Data technologies.

• Mandatory - Hands-on experience in Python and PySpark.

• Build pySpark applications using Spark Dataframes in Python using Jupyter Notebook and PyCharm(IDE).

• Worked on optimizing spark jobs that process huge volumes of data.

• Hands-on experience in version control tools like Git.

• Worked on Amazon’s Analytics services like Amazon EMR, Amazon Athena, and AWS Glue.

• Worked on Amazon’s Compute services like Amazon Lambda, Amazon EC2 and Amazon’s Storage service like S3 and few other services like SNS.

• Experience/knowledge of bash/shell scripting will be a plus.

• Has built ETL processes to take data, copy it, structurally transform it etc. involving a wide variety of formats like CSV, TSV, XML and JSON.

• Experience in working with fixed width, delimited , multi record file formats etc.

• Good to have knowledge of data warehousing concepts – dimensions, facts, schemas- snowflake, star etc.

• Have worked with columnar storage formats- Parquet, Avro, ORC etc. Well versed with compression techniques – Snappy, Gzip.

• Good to have knowledge of AWS databases (at least one) Aurora, RDS, Redshift, ElastiCache, DynamoDB.

• Hands on experience in tools like Jenkins to build, test and deploy the applications

• Awareness of DevOps concepts and be able to work in an automated release pipeline environment.

• Excellent debugging skills.

Back To Jobs