Build and maintain scalable infrastructure for machine learning model & pipeline deployment, including containerization & orchestration.
Develop and maintain scalable & secure REST APIs for serving multiple machine learning models to various users.
Collaborate with data scientists and software engineers to ensure seamless integration of ML models into our systems.
Design and optimize data pipelines, data storage, and data processing systems to support the training and inference processes of machine learning models.
Build and maintain data and model dashboards to monitor model performance and health in production environments.
Collaborate with cross-functional teams to identify and address data quality, data governance, and security considerations in the context of ML operations.
Requirements:
Required
Bachelor's degree in Computer Science, Data Science, or a related field. A Master's or Ph.D. degree is a plus.
5+ years of hands-on experience in ML operations, ML engineering, or related roles.
Experience with AWS or Azure cloud platforms, specifically AWS Sagemaker
Experience with REST API development, AWS Networking Protocols
Solid understanding of infrastructure components and technologies, including containerization (e.g., Docker) and CI/CD pipelines
Strong knowledge of software engineering principles and best practices, including version control, code review, and testing.
Excellent problem-solving skills, with the ability to analyze complex issues and provide innovative solutions in a fast-paced environment.
Strong communication and collaboration skills, with the ability to work effectively with cross-functional teams and stakeholders