Description

Responsibilities:

Cloud Data Architecture: 

  • Design and build robust data pipelines using Spark SQL and Python in both batch and incrementally processed paradigms orchestrated via Azure Data Factory.

Feature Engineering:

  • Collaborate with data scientists to understand the features needed for Client models.
  • Implement feature extraction and transformation logic in the data pipelines.

FeatureOps:

  • Implement FeatureOps to manage the lifecycle of features including their discovery,
  • validation, and serving for training and inference purposes.

Training Dataset Support:

  • Work with data scientists to understand their requirements for training datasets.
  • Ensure that these datasets are accurately prepared, cleaned, and made available in a timely manner.

Data Pipeline Automation:

  • Automate the data pipelines using CI/CD approaches to ensure seamless deployment and updates.
  • This includes automating tests, deployments, and monitoring of these pipelines.

Data Quality:

  • Implement data quality frameworks and monitoring to ensure high data accuracy and reliability.
  • Identify and resolve any data inconsistencies or anomalies.

Collaboration:

  • Work closely with data scientists and Client engineers to understand their data needs.
  • Provide them with the necessary data in the right format to facilitate their work.

Optimization:

  • Continually optimize pipelines and databases for improved performance and efficiency.
  • This includes implementing real-time processing where necessary.

Data Governance:

  • Ensure compliance with data privacy regulations and best practices.
  • Implement appropriate access controls and security measures.
  • Data APIs.

Qualifications:

  • Experience supporting machine learning projects.
  • Familiarity with Client platforms (e.g., TensorFlow, PyTorch).
  • Experience with cloud platforms (e.g., Azure, AWS).
  • Proven experience as a Data Engineer or in a similar role.
  • Experience with big data tools (e.g., Hadoop, Spark) and databases (e.g., SQL, NoSQL).
  • Knowledge of machine learning concepts and workflows.
  • Strong programming skills (e.g., Python, Java).
  • Excellent problem-solving abilities and attention to detail.
  • Strong communication skills to effectively collaborate with other teams.

Education

Any Graduate