Description

What You Will Do

Collaborate with software engineering to craft, automate, and deploy cutting-edge tools for seamless testing and release management. Maintain and optimize deployment, monitoring, and operations tools. Troubleshoot and resolve issues across all environments, while providing top-notch automated customer support. Embrace a culture of automation and continuous improvement for enhanced support and development. Act as a vital liaison between teams, ensuring smooth operations and fostering an inclusive work environment.

Emphasize SRE as an engineering discipline, driven by automation.
Own KPIs for site stability, performance, and root cause analysis in production.
Develop services for automatic incident and disaster recovery.
Participate in troubleshooting, capacity analysis, planning, and performance analysis.
Define and review infrastructure as code 
Ensure compliance by addressing SecOps tools' requirements.
Develop, implement, maintain, and fine-tune monitoring and alerting systems.

What Skills & Experience You Should Bring

Bachelor's/Master’s degree in Computer Engineering, or related field 
Minimum 2 years experience in technical and people management.
Proven track record supporting applications and infrastructure in Production environments.
Expertise in Capacity planning and optimizing costs for efficient operations.
Extensive experience with Amazon Web Services (Azure or GCP also considered).
In-depth knowledge of Linux/Unix operating systems for seamless performance.
Skilled in using high-level scripting languages (Python preferred) and IaC tools (Terraform, CloudFormation) along with containerization.

Education

Any Graduate