About the job
5 to 7 years of working experience in data integration and pipeline development with data warehousing.
Experience with AWS Cloud on data integration with Databricks, Apache Spark, EMR, Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS, MongoDB/DynamoDB ecosystems
Strong real-life experience in python development especially in pySpark in the AWS Cloud environment.
Design, develop test, deploy, maintain and improve data integration pipeline.
Experience in Python and common python libraries.
Strong experience with Perl and Unix Scripts.
Strong analytical experience with databases in writing complex queries, query optimization, debugging, user-defined functions, views, indexes etc.
Strong experience with source control systems such as Git, Bitbucket, and Jenkins build and continuous integration tools.
Experience with continuous deployment(CI/CD)
Databricks, Airflow, and Apache Spark Experience
Experience with databases (Oracle, SQL Server, PostgreSQL, Redshift, MySQL, or similar)
Strong experience with performance tuning, and analytical understanding of business and programs.
Exposure to ETL tools including Informatica and any other .
BS/MS degree in CS, CE, or EE.
Any Graduate