Responsibilities:
Cloud Data Architecture:
- Design and build robust data pipelines using Spark SQL and Python in both batch and incrementally processed paradigms orchestrated via Azure Data Factory.
Feature Engineering:
- Collaborate with data scientists to understand the features needed for Client models.
- Implement feature extraction and transformation logic in the data pipelines.
FeatureOps:
- Implement FeatureOps to manage the lifecycle of features including their discovery,
- validation, and serving for training and inference purposes.
Training Dataset Support:
- Work with data scientists to understand their requirements for training datasets.
- Ensure that these datasets are accurately prepared, cleaned, and made available in a timely manner.
Data Pipeline Automation:
- Automate the data pipelines using CI/CD approaches to ensure seamless deployment and updates.
- This includes automating tests, deployments, and monitoring of these pipelines.
Data Quality:
- Implement data quality frameworks and monitoring to ensure high data accuracy and reliability.
- Identify and resolve any data inconsistencies or anomalies.
Collaboration:
- Work closely with data scientists and Client engineers to understand their data needs.
- Provide them with the necessary data in the right format to facilitate their work.
Optimization:
- Continually optimize pipelines and databases for improved performance and efficiency.
- This includes implementing real-time processing where necessary.
Data Governance:
- Ensure compliance with data privacy regulations and best practices.
- Implement appropriate access controls and security measures.
- Data APIs.
Qualifications:
- Experience supporting machine learning projects.
- Familiarity with Client platforms (e.g., TensorFlow, PyTorch).
- Experience with cloud platforms (e.g., Azure, AWS).
- Proven experience as a Data Engineer or in a similar role.
- Experience with big data tools (e.g., Hadoop, Spark) and databases (e.g., SQL, NoSQL).
- Knowledge of machine learning concepts and workflows.
- Strong programming skills (e.g., Python, Java).
- Excellent problem-solving abilities and attention to detail.
- Strong communication skills to effectively collaborate with other teams.