Job Details
Skillset Requirements:
- Experience using monitoring tools such as Splunk, Dynatrace, Kibana, OpenSearch, and Grafana
- Understanding of OpenTelemetry for application logging
- Knowledge of GraphQL API
- Proficiency with Kubernetes, Istio, and Gloo
- Familiarity with Java and Linux servers
- Expertise with GitHub and release tools like Jenkins or XLR
- Experience with ServiceNow and cloud technologies
- Understanding of network load balancing and SSL certificates
- Knowledge of Site Reliability Engineering (SRE) principles
Roles and Responsibilities:
- As a Site Reliability Engineer (SRE) and application support, set up new dashboards and monitoring systems.
- Support channels and partners with issues in lower environments and production.
- Address production issues and participate in proactive bridges.
- Assist in platform and application migrations from start to finish.
- Perform annual certificate renewals and follow up with partners for their certificate renewals.
- Collaborate with engineering teams on platform releases.
- Support the infrastructure team with patching.
- Handle client queries regarding our platform and assist them with onboarding to our application.
- Provide support for Tier 0 applications and understand the complete request journey flow.
- Work with respective teams on data issues.
- Enhance application availability and performance.
- Share necessary statistics with clients and leadership.
- Manage incidents, problems, changes, and releases.
- Conduct proactive monitoring.