Role:
Setting up world class observability platform for Multi Cloud Infrastructure services. Reviewing and contributing to setting up observability for infrastructure of new/existing cloud apps. Analyzing, troubleshooting, and designing vital services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, automation, security, and performance
Continue improving cloud product reliability, availability, maintainability & cost/benefit–incl. developing fault-tolerant tools to ensure general robustness of the cloud infra
Responsible for availability, performance, monitoring, and incident response, among other things, of the platforms and services of cloud Landing zone
Manage capacity across public and private cloud resource pools–incl. automating scale down/up of environments
Ensuring that everything that goes to production complies with a set of general requirements like diagrams, documents, security compliance, dependencies of other services, monitoring and logging plans, backups, and possible high availability setups
Ensuring the efficient functioning of cloud resources and functions in accordance with company security policies and best practices in cloud security
Employ exceptional problem-solving skills, with the ability to see and solve issues before they affect business productivity
Support developers in optimising and automating cloud engineering activities, –e.g. real-time migration, provisioning and deployment, etc
Monitoring and action of hardware degradation, networking problems, high usage of resources, or slow responses on cloud Landing zone
Preparing and managing runbook having procedures necessary for getting services up and running again quickly in case of any issues
Enable automation for some of key functions like CI/CD across SDLC phases, monitoring, alert, incident response, infra provisioning, and patching
Any Graduate