Must-Have Skills for Leads (2 positions):
- Python & PySpark: Strong proficiency in both Python and PySpark (70% PySpark, 30% Java focus).
- Java: Solid understanding and experience in Java development.
Must-Have Skills for Other Engineers (3 positions):
- Python & PySpark: Strong proficiency in Python and PySpark.
Additional Essential Skills (for all positions):
- SQL: Strong SQL skills for data querying and manipulation.
- AWS: Extensive experience with AWS cloud services and its data-related offerings.
- Data Lake: Experience working with and building data lakes.
- Databricks (Plus): If mentioned on the resume, candidates must be able to articulate their experience with Databricks.
Responsibilities:
- Design, develop, and maintain data pipelines using PySpark and Java.
- Migrate data from legacy platforms to a new AWS-based platform.
- Write and optimize complex Spark transformations and SQL queries.
- Work collaboratively with data scientists and other stakeholders.
- Ensure data quality, integrity, and security.
Qualifications:
- Strong understanding of Big Data concepts and technologies.
- Hands-on experience with data processing and transformation using PySpark.
- Proficiency in Python or Java development.
- Experience with AWS cloud services, especially those related to data storage and processing (e.g., S3, EMR, Redshift).
- Excellent problem-solving and communication skills.