Data Engineer - Module Lead.
- 8+ Yrs of experience on Data Engineering.
- Build ETL pipelines using technologies such as Python and Spark
- Implement new ETL pipelines on top of a variety of architectures (e.g. file-based, streaming)
- Optimally store large datasets using a variety of file formats (e.g. Parquet, JSON) and table types (e.g. Iceberg, Hive)
- Work with Data Analysts and Data Scientists to understand and make available the data that is important for their analysis
- Work with our Data Platform, Architecture, and Governance sibling teams to make data scalable, consumable, and discoverable
- Leverage Cloud-based technologies (mostly AWS, some GCP) to build and deploy data pipelines
We’re looking for someone with:
- Strong knowledge of Oracle / Mongo DB.
- 2+ years of building ETL pipelines for a Data Lake/Warehouse
- 2+ years Python experience
- 2+ years Spark experience
- Hive, Iceberg, Glue, or other technologies that expose big data as tables
- Familiarity with different big data file types such as Parquet, Avro, and JSON
- Background in building data platforms in the Cloud (e.g. AWS, GCP, Azure)
- Experience building RAG-based tools is a plus