Job Description:
Senior Site Reliability Engineer
- Design and develop automated monitoring function with key performance and service level indicators
- Work to improve system resiliency, performance and efficiency
- Participate in incident response and postmortems
- Experience monitoring infrastructure and application up time and availability to ensure functional performance objectives
- Cross functional knowledge with systems, storage, networking, security and databases
- System administration skills, including orchestration of Linux and containers (Docker, Kubernetes)
- Code strategies and languages by leveraging knowledge while working with customers on configuration management initiatives.
- Coordinate and assist teams in building competencies with infrastructure using object oriented programming and configuration management domain specific language.
- Research and develop competencies using service mesh architecture.
- Build platforms that teams can leverage to accelerate innovation in the areas of reliability, scalability and velocity.
- Provide operational support for Continuous Feedback and Event Analytics platform.
Additionally, you’ll bring:
- 2+ years of experience in the field or in a related area.
- Bachelor’s degree in Computer Science, Information Systems Management, Engineering or related field or equivalent experience.
- Experience with software engineering, enterprise operations support, object oriented programming, automation and consulting with internal customers.
- Ability to quickly learn technologies such as: Docker, Kubernetes, Open Telemetry, Grafana, AWS, Ansible, Nginx, Elasticsearch, GoLang and Python