Responsibilities:
• Perform analysis on previous alerts, incidents and usage patterns to better predict issues and take proactive actions to eliminate the problem/incident occurrence.
• Ensure timely resolution of events/alerts and incidents. Act as L2.5 for Infrastructure issues and escalations, work on incident and problem management activities. Coordinate problems/defects/corrective actions with L3 teams.
• Perform quality reviews on the analysis and resolutions done by L2 team. Participate and Review the RCAs/8Ds. Ensure timely implementation and effectiveness of permanent corrective actions owned by L2.
• Work with cloud providers and drive implementation of cloud policies and best practices. Coordinate/Work with technology vendors(Like MSFT, RedHat etc.) for tech vendor related issues.
•Reduce toil by automating repeated operational activities and reduce risk of manual failures by adopting automation/scripting.
• Advises on capacity planning and provides continuous assessments on systems behavior and consumption, working towards optimization of resources by maintaining the reliability and resilience of infrastructure and products
Skills:
1. Cloud engineering with Dev Ops and CI/CD tools experience .
2. Container Orchestration
3. Monitoring and Logging Tools( Ex: Datadog, log.io, Runscope and Dynatrace)
4. Ingress/Egress (Ex: Nginx and Sophos)
Good to have --
1. AWS and GCP
2. Networking Skills - Ingress and Egress
Any Graduate
INR 12,00,000 -20,00,000