Key Responsibilities
Work on client projects to deliver AWS, PySpark, Databricks based Data engineering & Analytics solutions.
Build and operate very large data warehouses or data lakes.
ETL optimization, designing, coding, & tuning big data processes using Apache Spark.
Build data pipelines & applications to stream and process datasets at low latencies.
Show efficiency in handling data - tracking data lineage, ensuring data quality, and improving discoverability of data.
Technical Experience
Minimum of 2 years of experience in Databricks engineering solutions on AWS Cloud platforms using PySpark, Databricks SQL, Data pipelines using Delta Lake.
Minimum of 5 years of experience years of experience in ETL, Big Data/Hadoop and data warehouse architecture & delivery.
Minimum of 3 years of experience years in real time streaming using Kafka/Kinesis
Minimum 4 year of Experience in one or more programming languages Python, Java, Scala.
Experience using airflow for the data pipelines in min 1 project.
2 years of experience developing CICD pipelines using GIT, Jenkins, Docker, Kubernetes, Shell Scripting, Terraform
Good knowledge on Data Warehousing
Professional Attributes
Ready to work in C Shift (1 PM - 11 PM)
A Client facing skills: solid experience working in client facing environments, to be able to build trusted relationships with client stakeholders.
Good critical thinking and problem-solving abilities
Health care knowledge
Good Communication Skills
Educational Qualification
Bachelor of Engineering / Bachelor of Technology
Additional Information
Data Engineering, PySpark, AWS, Python Programming Language, Apache Spark, Databricks, Hadoop, Certifications in Databrick or Python or AWS.
Any Graduate