Technically sound in AWS, Kubernetes, and Python, basic SQL, ML Ops knowledge like MLFlow a plus.
Answeing/Fixing support issues for DatalaLab
Implement and maintain Infra as Code, and Build pipeline
Taking mesaures to minimize on-call incidents
Post incident reviews
Documenting the issue resolution and the undocumented knowledge
Work with dev teams to ensure that the new features meet the reliability and performance goals
Ability to work with geographically distributed teams in India and SCV
Successful candidate will several years of experience in supporting large enterprise system with at least 10 different upstream and downstream systems., Identifying issues from splunk logs
Excellent problem solving skills and decision making skills about when to engage other team members.
Key Skills: Devops Support SRE, AWS with Sagemaker ML
Any Gradute