So, what’s the role all about?
As a Site Reliability Engineer (SRE) for our large and regionally distributed SaaS platform, your primary responsibilities will be to improve the reliability and availability of our mission-critical cloud-based services.
How will you make an impact?
Essential Duties and Responsibilities:
Observability and Monitoring:
Create new dashboards and metrics to provide comprehensive observability into the health and performance of development teams' applications, including SLI/SLO metrics.
Work with development teams to ensure proper monitoring is set up and enabled for their services.
Identify evolutionary improvements to the observability and monitoring solutions.
Reliability Consulting and Automation:
Consult with development teams on SRE services and best practices to help them improve the reliability of their applications.
Create automation and tooling to reduce toil and manual intervention.
Incident and Problem Management:
Assist other teams in data and performance analysis to identify the root causes of issues and recommend automation actions.
Knowledge Sharing and Mentoring:
Review the work of other SREs and provide training and guidance to help them improve their skills.
Communicate effectively with both technical and non-technical peers and customers.
Process and Documentation:
Follow established processes when performing work or help document and create processes, as necessary.
Document troubleshooting steps and results in appropriate locations for historical access.
Ensure compliance with policies, procedures, and standards.
Implement or coordinate remediation required by audits and assessments, and document, as necessary.
Time Estimation:
Estimate the time required to complete activities and projects.
Have you got what it takes?
4+ years programming/scripting experience with any of the following: (Go, Python, .Net (C#), Node)
4+ years of experience working within public or private cloud environments
4+ years of SRE/DevOps/Observability or related experience
4+ years of AWS
Experience with Agile, Jira, GitHub, monitoring, automation, dashboarding
The role will have rotational Shifts: 7:30 PM to 4:00 AM (IST) / 12:00PM to 7:00 PM (IST)
You will have an advantage if you also have:
Kubernetes + certification, Grafana , AWS, Azure, DevOps experience.
Any Graduate