Responsibilities:
∙ Design, develop, and implement data pipelines for ingesting, pre-processing, and transforming data for Generative AI model training and inference.
∙ Build and maintain efficient data storage solutions, including data lakes, warehouses, and databases, appropriate for large-scale generative AI datasets.
∙ Implement data security and governance policies to ensure the privacy and integrity of sensitive data used in Generative AI projects.
∙ Collaborate with data scientists and engineers to understand data requirements for Generative AI models and translate them into efficient data pipelines.
∙ Monitor and optimize data pipelines for performance, scalability, and cost-effectiveness.
∙ Stay up-to-date on the latest advancements in data engineering tools and technologies (e.g., Apache Spark, Airflow, Snowflake, Data Bricks ) and apply them to our Generative AI platform.
∙ Document data pipelines and processes for clarity and transparency.
∙ Communicate effectively with technical and non-technical stakeholders about data quality and availability for Generative AI projects.
Qualifications:
∙ Bachelor’s degree in computer science, Data Science, Statistics, or a related field, or equivalent experience.
∙ 6+ years of experience in data engineering or related roles, such as data pipeline development, data storage, or ETL/ELT processes.
∙ Proven experience building and maintaining data pipelines for machine learning projects.
∙ Strong understanding of data modeling principles, data quality measures, and data security best practices.
∙ Proficient in programming languages like Python, SQL, and scripting languages (e.g., Bash, Shell).
∙ Familiarity with cloud platforms (e.g., AWS, GCP, Azure) for data storage and processing.
∙ Excellent communication, collaboration, and problem-solving skills.
∙ Ability to work independently and as part of a team.
∙ Passion for Generative AI and its potential to solve real-world challenges.
Any graduate