3 – 5 years of experience as a data scientist or engineer.
• 3+ years proven ability in distributed data technologies e.g., Spark, Yarn, Hadoop, Hive, etc.
• Best in class SQL experience. Ability to write sophisticated SQLs across platforms.
• Proven hands - on experience in Python/PySpark/Scala and ability to manipulate data using Pandas, NumPy, Koalas etc.
• Experience working as an architect to design distributed data platforms.
• Working experience with Open- source orchestration tools i.e., Apache Airflow/ Azkaban etc.
• Experience working with unstructured clinical data is a plus.
• A teammate with excellent communication/collaboration skills when it comes to closely working with data scientists and machine learning engineers daily.
Any Graduate