Job Description: Develop, modify, test and Deploy Data pipelines/ETL pipelines to handle large volumes of Structured, Semi Structured and Unstructured data for both Batch and real lime/stream processing. Design and Develop Data pipelines/ETL pipelines using Python, SOL, PySpark, Scala, Kafka, Java, Docker, Airflow. Develop and Deploy Data programs/ pipelines using cloud storage, cloud data warehouse, Cloud ETL, Cloud orchestration utilities on any of the cloud platforms like AWS, etc. Azure or GCP. Build Cl/CD pipelines using Jenkins, Circle Cl, GIT or any other tool. Implement data versioning using tools like DVC or Pachyderm. Utilize modern data lakehouse architectures with tools such as Delta Lake or Apache Iceberg. Implement data quality checks using Great Expectations or Deequ. Document High level Design, Low level Design, Unit test cases and migration steps for ease of deploying the code to production and maintain the same. Perform feasibility studies and may require traveling to set up systems across the nation. May require traveling to various unanticipated client locations within the United States for short- and long-term durations.
Bachelor's degree in Computer Science