Senior DevOps Engineer

Edgesys Consulting
Gujarat, India

Description

DevOps Engineer

Location: Remote (India)
Employment Type: Full-Time

About the Company:
Our Company is a leading provider of innovative device tracking solutions, specializing in real-time tracking, IoT integration, and advanced analytics.

Our mission is to enhance operational efficiency and security through cutting-edge technology. We are looking for a talented DevOps Engineer to join our team and drive the scalability, performance, and reliability of our systems.

Job Summary:
As a DevOps Engineer for Deep Learning, you will design, deploy, and maintain highly scalable, high-performance infrastructure for training and deploying machine learning models. Collaborating with AI researchers and engineers, you’ll optimize workflows, manage large-scale data pipelines, and ensure seamless integration of cutting-edge AI solutions.

Key Responsibilities:

Develop and maintain CI/CD pipelines for deep learning model training, testing, and deployment.
Design scalable, distributed infrastructure to support high-performance training on GPUs/TPUs.
Automate provisioning of cloud-based and on-premises deep learning clusters using tools like Terraform or Ansible.
Manage containerized environments (Docker) and orchestration systems (Kubernetes) for AI workloads.
Optimize workflows for data preprocessing, model training, and inference using tools like Dask, Apache Spark, or Ray.
Implement and maintain monitoring and logging systems for model performance and infrastructure health.
Collaborate with AI teams to improve deployment pipelines for real-time and batch inference.
Ensure the security and integrity of sensitive data used in AI workflows.
Scale data storage and processing pipelines for large datasets used in model training.

Qualifications:

Bachelor's or Master’s degree in Computer Science, Engineering, or a related field.
7+ years of experience as a DevOps Engineer, preferably in machine learning or data science environments.
Strong experience with CI/CD tools (e.g., GitLab CI, Jenkins, CircleCI).
Proficiency in cloud platforms (AWS, GCP, Azure) with a focus on GPU-based instances.
Expertise in containerization (Docker) and orchestration (Kubernetes).
Experience with infrastructure-as-code tools like Terraform, CloudFormation, or Ansible.
Solid programming skills in Python, Bash, or Go.
Familiarity with deep learning frameworks (e.g., TensorFlow, PyTorch).
Strong understanding of GPU/TPU optimization for large-scale model training.

Nice-to-Have Skills:

Experience with MLOps tools like MLflow, Kubeflow, or Sagemaker.
Familiarity with distributed training strategies (e.g., Horovod, DeepSpeed).
Knowledge of high-performance storage systems for ML datasets.
Exposure to monitoring AI models using Prometheus, Grafana, or similar tools.
Certifications in cloud platforms or Kubernetes.

Why Join Us?

Work on cutting-edge technologies at the intersection of AI and infrastructure.
Collaborate with world-class researchers and engineers in deep learning.
Access to state-of-the-art hardware and tooling for AI development.
Competitive salary, benefits, and career growth opportunities.
A culture that values innovation, learning, and collaboration.

Key Skills

Ci/cd Pipelines Jenkins Aws Google Cloud Platform (GCP) Microsoft Azure Docker Kubernetes Terraform Python Bash

Back To Jobs

Posted On: 18-Dec-2024
Availability: Remote
Openings: 5
Category: Senior DevOps Engineer
Tenure: Full-Time Position