Roles and Responsibility:
- Hands-on experience with PySpark, Redshift (SQL) and Airflow at minimum
- Strong hands-on with required tech skills, flexible, right attitude to play the lead role
- Should be able to design and document data model at various levels
- Working closely with the stakeholders.
- Building highly scalable, robust & fault-tolerant systems.
- Knowledge of Hadoop ecosystem and different frameworks inside it – HDFS, YARN, MapReduce, Apache Pig, Hive, Flume, Sqoop, ZooKeeper, Oozie, Impala and Kafka
- Must have experience on SQL-based technologies (e.g. MySQL/ Oracle DB) and NoSQL technologies (e.g. Cassandra and MongoDB)
- Should have Python/Scala/Java Programming skills
- Discovering data acquisitions opportunities
- Finding ways & methods to find value out of existing data.
- Improving data quality, reliability & efficiency of the individual components & the complete system.
- Problem solving mindset working in agile environment.