Job Details
The Site Reliability Engineer will build strong relationships within and outside the organization and stay engaged with industry standards and advances relevant to their roles and responsibilities and work introduce those to the organization This role will be responsible for participating and leading resolution to service impacting events as well as help in the developing solutions to automate and remediate issues both proactively as well as results of service impacting events. They will lead and participate in Root Cause Analysis and lead in user experience alerting and monitoring. The Site Reliability Engineer must exhibit communication and leadership skills to maintain positive business relations with employees, customers, business partners, peers and other IT&S personnel.
Some of those expectations are:
- Change Management/Orientation: Has the capacity to remain objective and can focus on constructive change when it makes sense for the enterprise.
- Hands on approach: Acts as role model for other leaders in maintaining an in-depth awareness of own area and others; demonstrates good working knowledge of current issues in areas with strategic links to own.
- Tough-Mindedness: Makes tough decisions that are in the best interests of the business without procrastinating or succumbing to undue pressure. Inspires: Wants to create value for our customers - someone who says, "let me see what we can do", yet maintains uniformity.
- Vision and Communication as one attribute:
- Skilled in persuasion, influential.
- Can communicate complicated technical scenarios in more common terms.
- Works towards a culture of Innovation: Encourages brainstorming and idea generation locally and within the larger client's systems management community.
- Decision Making:
- Balanced risk taker in complex ambiguous environments, is O.K. with ambiguity and comfortable with models. Professionally knowledgeable.
- Team Oriented Mindset: We will work collectively to solve issues and do what’s right for the business which may mean helping with other departments and vendor needs as necessary.
- As new team joins, they will work to model best practice and help coach on the client’s approach so that we can all support each other.
General Responsibilities
- Support one or more key areas including UNIX, AIX, and Redhat Linux.
- Support OS environment running one or more databases including SQL / DB2 desirable.
- Support DataPower appliances and WebSphere portal.
- Foundational network knowledge (CCENT / CCNA.)
- Work with DBA, Network, UNIX, and Application Engineers for troubleshooting purposes.
- Provide support for build team strategic / tactical initiatives.
- Demonstrate written and verbal communication skills and work collaboratively across technical disciplines.
- Perform complex tasks within operating system in a large-scale production environment.
- Working knowledge and experience with key ITIL/AGILE processes is required.
- Troubleshoot applications; development coding, debugging, testing and delivery within a large enterprise production environment are highly desirable.
- Demonstrate a solid understanding of computer performance metrics and tools, to include understanding of CPU utilization levels, CPU and disk queuing, I/O response times, and other key performance indicators and their impact on performance.
- Understanding of Splunk and other logging applications to perform adequate troubleshooting.
Must Haves:
- Minimum 5 years of relevant work experience required.
- Bachelor's degree preferred.
- Linux Engineer (Mid to Sr level)
- Scripting in Ansible
- Agile
- High level of experience in installation, monitoring, maintenance and support of all system and application mgmt. hardware, software and communication links.
- High aptitude across the server, network and monitoring applications technologies, services, and components.
- SQL / DB2 desirable
- Willing to be on-call (at the most 1/mo.)