Description

Must have

8+ years of Linux Administration experience.

Deep understanding of Linux distributions (e.g., Red Hat, CentOS, Ubuntu). Ubuntu is a requirement

Key Responsibilities

System Management: Oversee the installation, configuration, and maintenance of Linux-based servers and HPC systems based on NVIDIA DGX. Ensure high availability, performance, and security of all Linux based environments.

Troubleshooting: Identify and resolve complex issues related to Linux systems, including hardware failures, network problems, and software conflicts.

Security: Implement and manage security policies, including user permissions, firewall configurations, and intrusion detection systems. Conduct regular security audits and vulnerability assessments.

Automation: Develop and maintain scripts and automation tools for system monitoring, backups, and deployments using languages like Bash, Python, or Ansible.

Performance Tuning: Monitor system performance and conduct performance tuning to ensure optimal efficiency and response times.

Disaster Recovery: Design and implement disaster recovery plans and backup strategies to ensure data integrity and availability.

Documentation: Create and maintain comprehensive documentation for system configurations, processes, and procedures. Ensure documentation is up-to-date and accessible.

Collaboration: Work closely with other IT team members, developers, and stakeholders to support projects and provide technical guidance. Participate in on-call rotations as needed.

Upgrades and Patching: Manage system upgrades and patches, ensuring minimal disruption to operations. Evaluate new technologies and tools for potential integration. Qualifications:

Experience: Minimum of 5 years of experience as a Linux Administrator or in a similar role, with extensive hands-on experience in managing large-scale Linux based environments.

Education: Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent work experience.

Certifications: Relevant certifications such as RHCE (Red Hat Certified Engineer), CompTIA Linux+, or LPIC (Linux Professional Institute Certification) are highly desirable. Technical Skills:

Deep understanding of Linux distributions (e.g., Red Hat, CentOS, Ubuntu)

Experience with virtualization technologies (e.g., VMware, KVM, Kubernetes)

Proficiency in scripting and automation tools (e.g., Bash, Python, Ansible)

Knowledge of networking concepts and protocols (e.g., TCP/IP, DNS, DHCP)

Familiarity with configuration management tools (e.g., Puppet, Chef)

Experience with cloud platforms (e.g., AWS, Azure) is a plus. Soft Skills

Strong problem-solving and analytical skills

 

Education

Any Gradute