Responsibilities:
• Lead the architecture, design, and implementation of highly scalable and resilient AI infrastructure solutions utilizing Seldon Core AI software on OpenShift Enterprise Kubernetes in GPU-based environments.
• Provide technical leadership and mentorship to a team of engineers, guiding them in best practices for AI infrastructure design, deployment, and optimization.
• Collaborate closely with cross-functional teams including data scientists, software engineers, and DevOps engineers to understand requirements and drive innovative solutions.
• Develop and implement automation strategies for deployment, monitoring, and management of AI workloads, ensuring efficiency and reliability at scale.
• Drive performance optimization efforts to maximize resource utilization and throughput of GPU-based infrastructure for AI model training and inference.
• Establish and enforce security best practices to protect AI infrastructure and data assets against potential threats and vulnerabilities.
• Stay abreast of emerging technologies and industry trends in AI infrastructure, evaluating their potential impact and driving adoption where appropriate.
• Extensive experience designing and implementing AI infrastructure solutions, with a focus on Seldon Core AI software and OpenShift Enterprise Kubernetes.
• Proven track record of technical leadership, including mentoring junior engineers and driving successful project outcomes.
• Expertise in scripting and automation using tools such as Ansible, Terraform, or similar, with a strong emphasis on infrastructure as code (IaC) principles.
• Deep understanding of GPU architecture and performance optimization techniques for AI workloads, with hands-on experience in tuning and scaling GPU-based infrastructure
Any Graduate