Description

Job Description:

  • Kubernetes certified professional or an expert administrator of Kubernetes and Helm
  • A self-learner, self-driven, and able to operate with minimal supervision.
  • Able to demonstrate expertise in at least one public cloud infrastructure (AWS/Azure/OCI).
  • Be proficient in APM (Application Performance Monitoring) tools like Datadog APM, Dynatrace, AppDynamics, etc.
  • Able to successfully communicate with business partners, management, and technical team members.
  • Experienced SRE with development or DevOps background, worked on enterprise-scale applications.
  • Proficient user of Monitoring and alerting tools. Proactive in raising problems and identifying solutions.
  • AWS SysOps Associate or DevOps professional certified (or equivalent in other cloud service providers).
  • Strong sense of customer service. Able to work in a highly collaborative team setting. Approaching work with a DevOps and continuous improvement mindset

 

Minimum Qualifications:

Bachelor's degree

  • Minimum of 5 years of experience in enterprise-level DevOps role. (Minimum 3 years with Cloud AWS/Azure and 2 years with Kubernetes Administration)
  • Expertise in Kubernetes administration/development, hands-on experience in Helm
  • Strong knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
  • Expertise is required in observability and monitoring tools like Dynatrace, Datadog, AppDynamics, Splunk, etc.
  • A deep understanding of Application performance monitoring (APM) and user monitoring is essential.
  • Sound knowledge of ITSM process, SI/SLO/SLA management, incident resolution, and automation techniques
  • Strong IP networking fundamentals and experience with usage of standard application protocols and messages (e.g., TCP/IP, HTTP, SOAP, RESTful APIs, XML/JSON, JDBC, JMS/MQ)
  • Knowledge of Infrastructure as Code (IaC): Ansible, AWS Cloud Formation, etc., is preferable.
  • Apply standards of cloud compliance to application design to achieve reliability.
  • Able to analyze application and server logs and error interpretation.
  • Ability to code in one of the programming languages (Java, Python, Shell, etc.)
  • Experience in site reliability engineering in Java, Kubernetes, and Database platforms (like Postgres)
  • The candidate should possess excellent written and verbal communication and collaboration skills.

Education

Bachelor's degree