Must-Have Skills For Leads (2 Positions)
Python & PySpark: Strong proficiency in both Python and PySpark (70% PySpark, 30% Java focus).
Java: Solid understanding and experience in Java development.
Must-Have Skills For Other Engineers (3 Positions)
Python & PySpark: Strong proficiency in Python and PySpark.
Additional Essential Skills (for All Positions)
SQL: Strong SQL skills for data querying and manipulation.
AWS: Extensive experience with AWS cloud services and its data-related offerings.
Data Lake: Experience working with and building data lakes.
Databricks (Plus): If mentioned on the resume, candidates must be able to articulate their experience with Databricks.
Responsibilities
Design, develop, and maintain data pipelines using PySpark and Java.
Migrate data from legacy platforms to a new AWS-based platform.
Write and optimize complex Spark transformations and SQL queries.
Work collaboratively with data scientists and other stakeholders.
Ensure data quality, integrity, and security.
Qualifications
Strong understanding of Big Data concepts and technologies.
Hands-on experience with data processing and transformation using PySpark.
Proficiency in Python or Java development.
Experience with AWS cloud services, especially those related to data storage and processing (e.g., S3, EMR, Redshift).
Excellent problem-solving and communication skills.
Any Graduate