Description

About The Role

In this opportunity, you will:

Promote site reliability engineering and DevOps best practices. Ensure that non-functional requirements are fed into the backlog to cover requirements for: high availability, scalability, secured, performance, self-healing and observability
Implement non functional requirements to ensure system availability, scalability, performance, monitoring, alerting and efficient incident troubleshooting
Build and maintain monitoring for all aspects of infrastructure, micro-services and the platform and implement Alerting mechanism using cloud native solutions
Provide primary operational support and engineering for multiple large, distributed platforms
Act as the go to person for any production issue, troubleshoot and escalate as needed and monitor until successful resolution. Communicate status with the team and stakeholders.
Participate in on-call/shift rotations (L2). When on-call, you are expected to drive the troubleshooting and mitigation activities while working on incident
Provide Technical leadership to the local SREs team, and guide them as needed and collaborate extensively with the parallel scrum teams, product owner and architects.
Drive automation (to reduce toil), continuous improvements and post mortem
Maintain end-to-end security ensuring that we meet best practices standards
Keep up-to-date with emerging cloud technology trends, especially around DevOps, Service Reliability and Security.
Adopt pan-TR operation principles to ensure consistency and efficiency


About You

You’re a fit for the role of Lead SRE, if you:

Bachelor's degree in Computer Science or related field preferred - a must

Minimum of 8 years experience as DevOps engineer and/or Site Reliability engineering with hands on experience in cloud technologies
Highly skilled in Unix/Linux and knowledge (exposure to RHEL)
Proven experience in building and operating cloud native infrastructure, applications and services on AWS
Must have experience in automation for CI/CD (either TerraForm /Git /Jenkins / CloudFormation …)
Experience or knowledge of Container technology such as Docker, Kubernetes and Istio service mesh
Experience or knowledge of AWS services, such as Cloud Front, Threat detection and other security controls
Experience or knowledge of Distributed logging: DataDog, ELK, SumoLogic, CloudWatch
Team player with a can do attitude

Education

Any Graduate