Description

Responsibilities:

Model Training and Development

oDesign, train, and fine-tune machine learning models using frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.

oUtilize GPU resources to optimize model training processes, ensuring high efficiency and low latency.

oDevelop OCR models using Tesseract and other tools to extract and process textual information from various types of documents and images.

Data Processing and Preparation

oConduct thorough data processing and cleaning to ensure high-quality input for training.

oDevelop and maintain pipelines for data preprocessing, augmentation, and transformation tailored to specific ML model requirements.

oCollaborate with data engineering teams to ensure seamless integration of data for ML workflows.

API Development and Deployment

oDevelop and deploy RESTful APIs with Flask to expose trained ML models and features to downstream applications.

oBuild scalable, robust API endpoints to enable real-time and batch model inference.

oCollaborate with DevOps teams to ensure APIs are effectively monitored, scaled, and maintained.

Performance Optimization

oOptimize model training and inference speeds, leveraging GPU acceleration and efficient data processing methods.

oMonitor model performance post-deployment and retrain/update as needed to ensure accuracy and relevance.

Collaboration and Documentation

oWork closely with cross-functional teams, including data scientists, engineers, and product managers, to align ML initiatives with business objectives.

oDocument processes, models, and workflows, ensuring reproducibility and knowledge sharing within the team.

Qualifications:

Education: Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or a related field.

Experience:

o3+ years of hands-on experience in machine learning model development and training.

oProven experience training large models on GPUs, optimizing for both speed and accuracy.

Technical Skills:

oProficiency in machine learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.

oExperience with OCR tools, particularly Tesseract, for text extraction from images.

oStrong experience with Flask for building and deploying RESTful APIs.

oProficient in Python for data processing, model training, and API development.

oExperience with data manipulation and preprocessing libraries (Pandas, NumPy, etc.).

oFamiliarity with Docker for containerizing applications and knowledge of cloud-based solutions (AWS, GCP, or Azure) is a plus.

Other Skills:

oExcellent problem-solving abilities and attention to detail.

oStrong written and verbal communication skills, with an ability to explain complex ML concepts to non-technical stakeholders.

Education

Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or a related field.