Description

Responsibilities

 

Technical Leadership and Subject Matter Expertise:

  • Provide technical / architectural guidance and support to other engineers on the team.
  • Have demonstrated experience interacting technical leadership on client-side to understand their technical needs and devise a plan to deliver the product to their satisfaction.
  • Ensure adherence to timelines and client expectations; ability to communicate the progress timely, gather feedback and nimbleness to adapt quickly and course correct as needed.

 

Infrastructure Deployment and Maintenance:

  • Deploy and manage Grafana LGTM in Kubernetes (K8s) using Helm charts.
  • Implement HPA / Keda Autoscaling for LGTM stack. Performance tuning the components and setup right resource limits.
  • Set up and maintain Azure DevOps pipelines.
  • Work with Azure portal for resource management and monitoring.

 

Programming and Version Control:

  • Possess a strong programming background in Java and/or .NET.
  • Utilize Git and Azure Repos for version control and collaboration.

 

Monitoring and Dashboards:

  • Good experience working with Helm, GitHub, Grafana and Prometheus.
  • Work with Engineering teams to onboard their applications to Grafana and help them set up required dashboards.
  • Build and maintain Grafana dashboards to monitor system health, performance, and metrics.
  • Familiarity with PromQL, LokiQL, and TraceQL for querying and visualizing data.

 

Kubernetes Expertise:

  • Demonstrate a deep understanding of Kubernetes (K8s) architecture, components, and best practices.
  • Experience with Rancher Desktop or similar tools for local development and testing.

 

Dynatrace Knowledge:

  • Familiarity with Dynatrace for application performance monitoring.
  • Ability to create custom metrics and dashboards.

 

Azure AD Integration:

  • Integrate Grafana with Azure Active Directory (AD) for authentication and access control.

 

Onboarding and Alerting:

  • Manage production incidents and work with application teams to conduct root cause and remediation.
  • Create runbooks for applications to resolve critical and recurring issues in quick time.
  • Onboard new applications and infrastructure components to the Grafana stack.
  • Set up alerts and notifications for application performance, incidents, and issues.

 

Qualifications

 

  • Bachelor’s degree in computer science, Information Technology, or related field (or equivalent experience).
  • Proven experience as a DevOps Engineer, SRE, or similar role.
  • Strong problem-solving skills and attention to detail.
  • Excellent communication and collaboration abilities.
  • Certifications in Kubernetes, Azure, or related technologies are a plus

Education

Bachelor's degree