Responsibilities
Technical Leadership and Subject Matter Expertise:
- Provide technical / architectural guidance and support to other engineers on the team.
- Have demonstrated experience interacting technical leadership on client-side to understand their technical needs and devise a plan to deliver the product to their satisfaction.
- Ensure adherence to timelines and client expectations; ability to communicate the progress timely, gather feedback and nimbleness to adapt quickly and course correct as needed.
Infrastructure Deployment and Maintenance:
- Deploy and manage Grafana LGTM in Kubernetes (K8s) using Helm charts.
- Implement HPA / Keda Autoscaling for LGTM stack. Performance tuning the components and setup right resource limits.
- Set up and maintain Azure DevOps pipelines.
- Work with Azure portal for resource management and monitoring.
Programming and Version Control:
- Possess a strong programming background in Java and/or .NET.
- Utilize Git and Azure Repos for version control and collaboration.
Monitoring and Dashboards:
- Good experience working with Helm, GitHub, Grafana and Prometheus.
- Work with Engineering teams to onboard their applications to Grafana and help them set up required dashboards.
- Build and maintain Grafana dashboards to monitor system health, performance, and metrics.
- Familiarity with PromQL, LokiQL, and TraceQL for querying and visualizing data.
Kubernetes Expertise:
- Demonstrate a deep understanding of Kubernetes (K8s) architecture, components, and best practices.
- Experience with Rancher Desktop or similar tools for local development and testing.
Dynatrace Knowledge:
- Familiarity with Dynatrace for application performance monitoring.
- Ability to create custom metrics and dashboards.
Azure AD Integration:
- Integrate Grafana with Azure Active Directory (AD) for authentication and access control.
Onboarding and Alerting:
- Manage production incidents and work with application teams to conduct root cause and remediation.
- Create runbooks for applications to resolve critical and recurring issues in quick time.
- Onboard new applications and infrastructure components to the Grafana stack.
- Set up alerts and notifications for application performance, incidents, and issues.
Qualifications
- Bachelor’s degree in computer science, Information Technology, or related field (or equivalent experience).
- Proven experience as a DevOps Engineer, SRE, or similar role.
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration abilities.
- Certifications in Kubernetes, Azure, or related technologies are a plus