Description

Job Description

We are looking for an experienced Senior DevOps Engineer / Architect who is comfortable with automating and maintaining cloud infrastructures as it relates Azure Kubernetes Service and applications running within it. We support a diverse set of needs across application hosting, data science, and AI workloads and want to increase our team's capacity to work on new projects, support existing workloads, and provide recommendations / automation for improvements to our hosting platform. You will be supporting the platform, with a particular priority focus on certain highly available applications during "second shift" hours (4P-12A) as primary working hours for coverage, with expectations to participate in an on-call rotation. This will mean a drive to understand our systems and become a relatively independent operator as our core hours are 9-6 with typical on-call coverage from 8-8 EST for most team members. Communicating well via documentation, tickets, and pull requests/review process will be vital to success.

 

 

Requirements

Proven experience as a Senior DevOps Engineer or similar software engineering role.

Strong experience with Kubernetes, Docker, and containerization.

Proficiency in scripting languages such as Python, Bash

Experience with CI/CD tools - we use Github Actions

Knowledge of cloud platforms like AWS, GCP, or Azure - Azure preferred

Strong problem-solving skills and attention to detail.

Excellent communication and teamwork skills.

Degree in Computer Science, Engineering or relevant field or equivalent experience (success in this role will likely require >5 years of progressive experience in Kubernetes hosting)

Experience with Java and Python applications hosted in containers, including resource utilization/optimization, build and deployment patterns

Experience with monitoring / alerting tools - we are working towards Datadog as our primary platform and have some legacy use of Dynatrace and Grafana

 

Nice to Have

Certifications like Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD).

Key Responsibilities

Take all of these through the lens of "understand what we have already, and improve or recommend changes to it"

 

Design, build, and maintain efficient, reusable, and reliable infrastructure code.

Implement and manage continuous delivery systems and methodologies using Kubernetes.

Manage and optimize Kubernetes clusters for maximum efficiency and scalability.

Work closely with development teams to identify and resolve system issues

Implement automation tools and frameworks (CI/CD pipelines, IaC pipelines, and other cluster management software)

Collaborate with team members to improve the company's engineering tools, systems and procedures, and data security.

Conduct systems tests for security, performance, and availability.

Develop and maintain design and troubleshooting documentation.

Work within support channels as required to support applications / migration efforts

Education

Bachelor's degree