Kubernetes certified professional or an expert administrator of Kubernetes and Helm
A self-learner, self-driven, and able to operate with minimal supervision.
Able to demonstrate expertise in at least one public cloud infrastructure (AWS/Azure/OCI).
Be proficient in APM (Application Performance Monitoring) tools like Datadog APM, Dynatrace, AppDynamics, etc.
Able to successfully communicate with business partners, management, and technical team members.
Experienced SRE with development or DevOps background, worked on enterprise-scale applications.
Proficient user of Monitoring and alerting tools. Proactive in raising problems and identifying solutions.
AWS SysOps Associate or DevOps professional certified (or equivalent in other cloud service providers).
Strong sense of customer service. Able to work in a highly collaborative team setting. Approaching work with a DevOps and continuous improvement mindset
Minimum Qualifications:
Bachelor's degree
Minimum of 5 years of experience in enterprise-level DevOps role. (Minimum 3 years with Cloud AWS/Azure and 2 years with Kubernetes Administration)
Expertise in Kubernetes administration/development, hands-on experience in Helm
Strong knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
Expertise is required in observability and monitoring tools like Dynatrace, Datadog, AppDynamics, Splunk, etc.
A deep understanding of Application performance monitoring (APM) and user monitoring is essential.
Sound knowledge of ITSM process, SI/SLO/SLA management, incident resolution, and automation techniques
Strong IP networking fundamentals and experience with usage of standard application protocols and messages (e.g., TCP/IP, HTTP, SOAP, RESTful APIs, XML/JSON, JDBC, JMS/MQ)
Knowledge of Infrastructure as Code (IaC): Ansible, AWS Cloud Formation, etc., is preferable.
Apply standards of cloud compliance to application design to achieve reliability.
Able to analyze application and server logs and error interpretation.
Ability to code in one of the programming languages (Java, Python, Shell, etc.)
Experience in site reliability engineering in Java, Kubernetes, and Database platforms (like Postgres)
The candidate should possess excellent written and verbal communication and collaboration skills.