Designing, building and deploying data systems, pipelines, and applications
Playing a key part in defining and establishing data pipelines to produce reliable feature sets for Data Analytics and Reporting
Ingesting data from a variety of data sources from relational databases to unstructured data such as text, CSV documents etc
To setup database connection with various on cloud/ on premises databases using connection methods as guided/ defined by different tools and client technical requirement guidelines
Work closely with the customers on everything including problem scoping, infrastructure provisioning, execution, deployment, maintenance Skills
Excellent hands-on experience with PySpark, Python, Scala,
Excellent hands-on experience of using Data Lake and Databricks, EMR
Worked on AWS Data orchestration tools and technologies
Ability to grasp challenges for business stakeholders and looking for their solutions
Proven success in communicating with users, other technical teams, and senior management to collect requirements, describe data modelling decisions and data engineering strategy
Has passion to work on new technologies.
Pyspark, Python, Scala, AWS, Data Lake, Healthcare Domain
Any Graduate