Description

Key Responsibilities

Act as a key liaison between R&D and stakeholders, ensuring technical alignment with business objectives.
Provide direction in scaling and optimizing ML/AI solutions aligned with company goals.
Shape the R&D roadmap based on industry insights and business needs.
Develop robust APIs and microservices for seamless ML model integration into production systems.
Design and develop common feature store and build feature pipeline for models
Ensure effective model integration with front-end applications, databases, and back-end services.
Publish findings in top conferences to advance field knowledge.
Guide machine learning engineers, fostering team growth through training and collaboration.
Build and manage end-to-end MLOps pipelines for data collection, model training, validation, and monitoring.
Ensure adherence to version control, testing, and model governance best practices.
Identify and address bottlenecks in ML models and services.
Implement model compression, quantization, and distributed training techniques.
Track key metrics and optimize models post-deployment.
Work with cloud architects and DevOps to design scalable ML infrastructure.
Oversee deployment and management of compute and storage resources for model training and inference.
Collaborate with applied scientists and analysts to convert model requirements into production-ready solutions.
Establish monitoring and alerting systems for deployed models to ensure prompt issue resolution.
Create and maintain documentation for ML architecture and best practices.
Stay current with ML technologies and contribute to ongoing enhancement efforts.

Required Qualifications

Bachelors/ Masters / PhDin Computer Science or related field.
12+ years of hands-on experience as a Machine Learning Engineer or Architect with a strong portfolio of deployed ML models for batch, streaming and realtime usecases
Excellent problem-solving skills and a strong analytical mindset.
Proficient in Python for model development and data manipulation.
Experience with Java or Scala for building production systems and microservices
Experience in messaging queues : Kafka, SQS
Expertise in MLOps tools and frameworks (e.g., MLflow, Kubeflow, Airflow).
Strong programming skills in languages such as Python, Java, or Scala.
Experience with cloud platforms (AWS, Google Cloud, Azure) and containerization technologies (Docker, Kubernetes).
Knowledge of machine learning frameworks (TensorFlow, PyTorch, Scikit-learn, TensorRT and Cuda).
experience in working on the following datastores : Elasticsearch, Mongo, Postgres, DyanmoDB, Redis, vectorDB, GraphDb
Experience with data processing and ETL tools (e.g., Apache Spark, Kafka).
Experience in monitoring tools Grafana and Prometheus.

Preferred Qualifications

Experience in large-scale production systems and distributed computing.
Familiarity with data engineering practices and data warehousing solutions.
Contributions to open-source projects or active participation in the ML community.
Enjoys discovering and solving problems; proactively seeking clarification of requirements and direction; being a self-starter who takes responsibility when required.
Strong interpersonal, verbal, visual and presentation skills, ability to communicate complex findings in a simple manner to executives.
Ability to work collaboratively across multiple products and application teams.
A willingness to learn, share and improve.

Education

Bachelor's degree in Computer Science