ML Data Engineer

V2 Innovations
Philadelphia, PA,

Description

Responsibilities:

∙ Design, develop, and implement data pipelines for ingesting, pre-processing, and transforming data for Generative AI model training and inference.

∙ Build and maintain efficient data storage solutions, including data lakes, warehouses, and databases, appropriate for large-scale generative AI datasets.

∙ Implement data security and governance policies to ensure the privacy and integrity of sensitive data used in Generative AI projects.

∙ Collaborate with data scientists and engineers to understand data requirements for Generative AI models and translate them into efficient data pipelines.

∙ Monitor and optimize data pipelines for performance, scalability, and cost-effectiveness.

∙ Stay up-to-date on the latest advancements in data engineering tools and technologies (e.g., Apache Spark, Airflow, Snowflake, Data Bricks ) and apply them to our Generative AI platform.

∙ Document data pipelines and processes for clarity and transparency.

∙ Communicate effectively with technical and non-technical stakeholders about data quality and availability for Generative AI projects.

Qualifications:

∙ Bachelor’s degree in computer science, Data Science, Statistics, or a related field, or equivalent experience.

∙ 6+ years of experience in data engineering or related roles, such as data pipeline development, data storage, or ETL/ELT processes.

∙ Proven experience building and maintaining data pipelines for machine learning projects.

∙ Strong understanding of data modeling principles, data quality measures, and data security best practices.

∙ Proficient in programming languages like Python, SQL, and scripting languages (e.g., Bash, Shell).

∙ Familiarity with cloud platforms (e.g., AWS, GCP, Azure) for data storage and processing.

∙ Excellent communication, collaboration, and problem-solving skills.

∙ Ability to work independently and as part of a team.

∙ Passion for Generative AI and its potential to solve real-world challenges.

Key Skills

Python SQL AWS GCP Azure

Education

Any graduate

Back To Jobs

Posted On: 12-Aug-2024
Category: ML Data Engineer
Tenure: Flexible Position