Job Description
We are looking for an experienced Senior DevOps Engineer / Architect who is comfortable with automating and maintaining cloud infrastructures as it relates Azure Kubernetes Service and applications running within it. We support a diverse set of needs across application hosting, data science, and AI workloads and want to increase our team's capacity to work on new projects, support existing workloads, and provide recommendations / automation for improvements to our hosting platform. You will be supporting the platform, with a particular priority focus on certain highly available applications during "second shift" hours (4P-12A) as primary working hours for coverage, with expectations to participate in an on-call rotation. This will mean a drive to understand our systems and become a relatively independent operator as our core hours are 9-6 with typical on-call coverage from 8-8 EST for most team members. Communicating well via documentation, tickets, and pull requests/review process will be vital to success.
Requirements
Proven experience as a Senior DevOps Engineer or similar software engineering role.
Strong experience with Kubernetes, Docker, and containerization.
Proficiency in scripting languages such as Python, Bash
Experience with CI/CD tools - we use Github Actions
Knowledge of cloud platforms like AWS, GCP, or Azure - Azure preferred
Strong problem-solving skills and attention to detail.
Excellent communication and teamwork skills.
Degree in Computer Science, Engineering or relevant field or equivalent experience (success in this role will likely require >5 years of progressive experience in Kubernetes hosting)
Experience with Java and Python applications hosted in containers, including resource utilization/optimization, build and deployment patterns
Experience with monitoring / alerting tools - we are working towards Datadog as our primary platform and have some legacy use of Dynatrace and Grafana
Nice to Have
Certifications like Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD).
Key Responsibilities
Take all of these through the lens of "understand what we have already, and improve or recommend changes to it"
Design, build, and maintain efficient, reusable, and reliable infrastructure code.
Implement and manage continuous delivery systems and methodologies using Kubernetes.
Manage and optimize Kubernetes clusters for maximum efficiency and scalability.
Work closely with development teams to identify and resolve system issues
Implement automation tools and frameworks (CI/CD pipelines, IaC pipelines, and other cluster management software)
Collaborate with team members to improve the company's engineering tools, systems and procedures, and data security.
Conduct systems tests for security, performance, and availability.
Develop and maintain design and troubleshooting documentation.
Work within support channels as required to support applications / migration efforts
Bachelor's degree