Description

Job Description:

Senior Site Reliability Engineer

  • Design and develop automated monitoring function with key performance and service level indicators
  • Work to improve system resiliency, performance and efficiency
  • Participate in incident response and postmortems
  • Experience monitoring infrastructure and application up time and availability to ensure functional performance objectives
  • Cross functional knowledge with systems, storage, networking, security and databases
  • System administration skills, including orchestration of Linux and containers (Docker, Kubernetes)
  • Code strategies and languages by leveraging knowledge while working with customers on configuration management initiatives.
  • Coordinate and assist teams in building competencies with infrastructure using object oriented programming and configuration management domain specific language.
  • Research and develop competencies using service mesh architecture.
  • Build platforms that teams can leverage to accelerate innovation in the areas of reliability, scalability and velocity.
  • Provide operational support for Continuous Feedback and Event Analytics platform.

Additionally, you’ll bring:

  • 2+ years of experience in the field or in a related area.
  • Bachelor’s degree in Computer Science, Information Systems Management, Engineering or related field or equivalent experience.
  • Experience with software engineering, enterprise operations support, object oriented programming, automation and consulting with internal customers.
  • Ability to quickly learn technologies such as: Docker, Kubernetes, Open Telemetry, Grafana, AWS, Ansible, Nginx, Elasticsearch, GoLang and Python

Education

Bachelor's degree