Description

Job Description

Design, develop, and maintain scalable data pipelines for ingesting, processing, and transforming large volumes of structured and unstructured data

Implement efficient data processing workflows to support the training and evaluation of solutions using large language models, ensuring reliability, scalability, and performance

Addressing issues related to data quality, pipeline failures, or resource contention, ensuring minimal disruption to systems

Integrate Large Language Model into data pipeline for natural language processing tasks

Deploying, scaling, and monitoring AI solutions on cloud platforms like Snowflake, Azure, AWS

Communicating technical and non-technical stakeholders and collaborate with cross-functional teams

Cloud cost management and best practices to optimize cloud resource usage and minimize costs

Preferred Qualifications
 

Experience working within the Azure ecosystem, including Azure AI Search, Azure Storage Blob, Azure Postgres and understanding how to leverage them for data processing, storage, and analytics tasks

Experience with techniques such as data normalization, feature engineering, and data augmentation

Ability to preprocess and clean large datasets efficiently using Azure Tools /Python and other data manipulation tools

Expertise in working with healthcare data standards (ex. HIPAA and FHIR), sensitive data and data masking techniques to mask personally identifiable information (PII) and protected health information (PHI) is essential

In-depth knowledge of search algorithms, indexing techniques, and retrieval models for effective information retrieval tasks. Familiarity with search platforms like Elasticsearch or Azure AI Search is a must

Familiarity with chunking techniques and working with vectors and vector databases like Pinecone

Experience working within the snowflake ecosystem

Be able to implement efficient data processing workflows to support the training and evaluation of solutions using large language models, ensuring reliability, scalability, and performance

Ability to proactively identify and address issues related to data quality, pipeline failures, or resource contention, ensuring minimal disruption to systems

Education

Any Graduate