Description

Job Description:

In the role of Site Reliability Engineer for Controls, Operational Risk, Compliance and Practices Technology, you will work in a collaborative team of software professionals and be responsible for improving the health of the applications. The Site Reliability Engineer will be part of a horizontal function that is responsible to ensure that the practices, processes and tools are in place to ensure stability and functionality of each application. This team will ensure the highest level of quality and success in support of technical issues, DR testing, and hardware/software updates. The SRE is expected to implement DevOps practices and automate the release process and develop scripts to automate the manual processes.

 

You will be working directly with other SRE members and development team members in the development and support of innovative technology solutions including user interfaces, middle-tier and server-side components, and will need to ensure adherence to architecture standards, risk management, and security policies.

As a Site Reliability Engineer for our technology teams, you will have the opportunity to instrument, build and maintain complex applications and also maintain vendor applications from a development and risk perspective.

 

Primary Responsibilities:

·         Troubleshoots incidents, conducts blameless post-mortems and ensures permanent closure of incidents.

·         Engages with development team throughout the life cycle to help develop software for reliability.

·         Applies analytics on historic data, such as incidents and usage patterns, to predict issues and take proactive action.

·         Drives adoption of self-healing and resiliency patterns such as circuit breaker, bulkhead etc.

·         Designs and conducts performance tests, identifies bottlenecks and opportunities for optimization.

·         Defines and drives adoption of best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting.

·         Designs, develops, tests and delivers software to automate manual operational work

·         Deploys software and product upgrades.

·         Adds value to team delivery and works with team to complete tasks to high quality and actively learns new skills.

·         Facilitates maximum speed of delivery by objectively binding to error budgets of the service.

·         Manages the effort split between manual operational work and engineering work.

·         Coaches other team members and manages teams as needed.

 

 

Required Skills:

·         Excellent debugging and trouble shooting skills.

·         Expert in performance monitoring and capacity management of large systems using various tools.

·         Expert in at least one technology stack (Java/J2EE/Python) with designing, coding, testing, and delivering software.

·         Expert in at least one of the relational databases (SQL Server, Oracle, DB2 etc.).

·         Hands-on experience with cloud technologies (Cloud Foundry, Kubernetes, AWS).

·         Hands-on experience with big data services (Hadoop, HDFS, Hive, Yarn, HBase, Kafka, Zookeeper).

·         Working knowledge of Groovy, batch scripting, PowerShell or shell scripting.

·         Experience developing, deploying and debugging distributed systems in a Linux, Hadoop environment.

·         Experience with monitoring tools such as AppD, Splunk, ELK, Geneos.

·         Analysis of SLI metrics and performance data. Interpreting and correlating it to SLOs and SLAs.

·         Experience with deployment automation, CI/CD, DevOps, Jenkins, GIT, BitBucket.

·         Experience with cloud/container environments, big data, analytical tools (Tableau, Alteryx).

·         Expert practitioner in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm.

·         Working knowledge of infrastructure components like routers, load balancers and networks.

·         Comfortable working in Agile mode and proficient in continuous integration and continuous delivery.

·         Solid understanding of micro-service design methodologies.

·         Solid analytical and problem solving skills.

·         A proven team lead with excellent communications skills.

·         Attention to detail and time-management skills.

·         Is endlessly curious about applications and application stability

Education

Bachelor’s Degree