Description

Job Description:

 

 

Dashboard Development: Design, develop, and maintain Grafana dashboards to visualize metrics, logs, and traces from various sources including Prometheus, Loki, and Tempo.

 Integration and Configuration: Configure and integrate Grafana with Prometheus, Loki, Tempo, and other data sources to collect, store, and visualize monitoring data effectively.

 Log and Metrics Analysis: Utilize LogQL and PromQL query languages to analyze logs and metrics data, identify trends, and troubleshoot issues proactively.

 Synthetic Monitoring: Implement and manage synthetic monitoring solutions to simulate user interactions and monitor the performance and availability of critical endpoints and workflows.

 API Management (APIM) Integration: Collaborate with APIM teams to integrate monitoring solutions into APIM platforms, leveraging LogQL and PromQL for analyzing API logs and metrics.

 Tempo Tracing Setup: Set up and configure Tempo for distributed tracing, enabling end-to-end visibility into application performance and latency across microservices architectures.

 Alerting and Notification: Configure alerting rules and notifications within Grafana to notify relevant teams of potential issues or anomalies in real-time.

 Performance Optimization: Identify and implement optimizations to improve the performance, scalability, and efficiency of monitoring systems.

 Automation and Scripting: Develop automation scripts and templates for deploying, managing, and scaling monitoring infrastructure and configurations.

 

Required Skills:
SRE

Education

BE/BTech/ME/ MTech/ MSc/MCA