Description


Data Engineer - Module Lead.

  • 8+ Yrs of experience on Data Engineering.
  • Build ETL pipelines using technologies such as Python and Spark
  • Implement new ETL pipelines on top of a variety of architectures (e.g. file-based, streaming)
  • Optimally store large datasets using a variety of file formats (e.g. Parquet, JSON) and table types (e.g. Iceberg, Hive)
  • Work with Data Analysts and Data Scientists to understand and make available the data that is important for their analysis
  • Work with our Data Platform, Architecture, and Governance sibling teams to make data scalable, consumable, and discoverable
  • Leverage Cloud-based technologies (mostly AWS, some GCP) to build and deploy data pipelines

 

We’re looking for someone with:

  • Strong knowledge of Oracle / Mongo DB.
  • 2+ years of building ETL pipelines for a Data Lake/Warehouse
  • 2+ years Python experience
  • 2+ years Spark experience
  • Hive, Iceberg, Glue, or other technologies that expose big data as tables
  • Familiarity with different big data file types such as Parquet, Avro, and JSON
  • Background in building data platforms in the Cloud (e.g. AWS, GCP, Azure)
  • Experience building RAG-based tools is a plus

Education

ANY GRADUATE