Description

Role:

Setting up world class observability platform for Multi Cloud Infrastructure services. Reviewing and contributing to setting up observability for infrastructure of new/existing cloud apps. Analyzing, troubleshooting, and designing vital services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, automation, security, and performance
 

Continue improving cloud product reliability, availability, maintainability & cost/benefit–incl. developing fault-tolerant tools to ensure general robustness of the cloud infra

Responsible for availability, performance, monitoring, and incident response, among other things, of the platforms and services of cloud Landing zone

Manage capacity across public and private cloud resource pools–incl. automating scale down/up of environments

Ensuring that everything that goes to production complies with a set of general requirements like diagrams, documents, security compliance, dependencies of other services, monitoring and logging plans, backups, and possible high availability setups

Ensuring the efficient functioning of cloud resources and functions in accordance with company security policies and best practices in cloud security

Employ exceptional problem-solving skills, with the ability to see and solve issues before they affect business productivity

Support developers in optimising and automating cloud engineering activities, –e.g. real-time migration, provisioning and deployment, etc

Monitoring and action of hardware degradation, networking problems, high usage of resources, or slow responses on cloud Landing zone

Preparing and managing runbook having procedures necessary for getting services up and running again quickly in case of any issues

Enable automation for some of key functions like CI/CD across SDLC phases, monitoring, alert, incident response, infra provisioning, and patching

 

Key Skills
Education

Any Graduate