Description

So, what’s the role all about?

The Site Reliability Engineer works as an software developer in reliability for a specific software application or suite of applications and accompanying infrastructure. This includes implementation of new systems as well as providing mid-level and escalation support for other groups and working to resolve production issues in conjunction with development, operational, and architectural resources.

How will you make an impact?

Develop production environment monitoring for availability by taking a holistic view of system health and uptime
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational support and engineering for multiple large, distributed software application. 
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives
Strategic thinker and capable of learning new technologies quickly. Understand the use to define and refine the monitoring requirements. 
Ability to work under high pressure and out of the box thinker

Have you got what it takes?

4+Years of experience in programming (structured and OO) with one or more high level languages, such as PowerShell, Shell/Bash, python/kickstart/puppet and Docker containers. 
3+ experience in Software Development with Java/.Net is preferred
Work hand in hand with R&D, DevOps and support teams to improve process and ensure systems are always operational. 
1+ years of working with GIT-based source control systems, preferably with GitHub
Good understanding of code promotion techniques, build automation, branching strategies. 
1+ years of working with Amazon AWS services, such as EC2, IAM, Dynamo, RDS, S3, EBS, CloudWatch, Lambda, API Gateway or Azure native services. 
Experience with load balancers, layer 4/7 load balancing, Haproxy, tcpdump a Plus. 
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks What’s in it for you?

Education

Any Graduate