Description

Job Code : EWC - 1210

Job Description Summary:

·Machine Learning Ops Engineer to build & support scalable, highly available and robust Machine Learning (ML) /Deep Learning (DL) platform using ML/DL frameworks, High-Performance Computing (HPC) machines, Data Science tools, products & services in cloud and on-premises for client’s data & analytics organization.

·Role will expose you to cutting edge technologies related to ML/DL and the ideal candidate will be driven, focused and enthusiastic about learning new technologies and implement them.

 

Responsibilities:

·Build, install, configure, manage, and scale state-of-the-art machine learning platform in cloud (Azure preferred) & on-premises powering client’s Data & Analytics products and solutions.

·Work with data scientists, architects, DevOps engineers, and vendors to implement scalable ML/DL solutions in cloud and on-premises to solve complex problems.

·Creating & maintaining ML/DL pipelines and overall ML/DL workflow orchestration including but not limited to data collection, prep, transform, analyze, experiment, train, validate, serve, monitor, etc.

·Implement ML/DL solutions addressing performance, scalability, and the governance/ traceability of machine learning models

·Iterate quickly through latest technologies, products, frameworks, and R&D on latest information related to ML/DL frameworks, tools & services.

 

Qualifications:

·4+ years’ experience delivering DevOps and MLOps in a Production/Enterprise setting

·Bachelor’s degree required; Masters preferred in Computer Science or Data Science

·Excellent written and oral communication and presentation skills.

·Experienced in a technical role involving platform and infrastructure operation. 

·System administration experience of Unix or Linux systems. 

·Container-based deployment experience using Docker and Kubernetes. 

·Proficient with the machine learning modelling lifecycle and comfortable addressing both functional and technical aspects of model delivery

·Experience with managing, deployment of large distributed systems like Spark, DASK & H20 and heterogenous platform components.

·Experienced with programming languages like Python or R and comfortable in understanding statistical foundations of most used ML algorithms.

·Experienced with Machine Learning frameworks:  Sci-kit, Keras, Theano, TensorFlow, Spark Mllib, etc. 

·Preferred hand-on experience IBM Watson Machine Learning systems or related preferred

·Preferred hands-on experience with HPC – Nvidia, CUDA

·Preferred experience with configuration Management tools like Ansible, puppet

·Preferred experience in monitoring and performance analysis of Machine Learning     platforms using tools like Grafana and Zabbix

Education

ANY GRADUATE