Job Description:
Dashboard Development: Design, develop, and maintain Grafana dashboards to visualize metrics, logs, and traces from various sources including Prometheus, Loki, and Tempo.
Integration and Configuration: Configure and integrate Grafana with Prometheus, Loki, Tempo, and other data sources to collect, store, and visualize monitoring data effectively.
Log and Metrics Analysis: Utilize LogQL and PromQL query languages to analyze logs and metrics data, identify trends, and troubleshoot issues proactively.
Synthetic Monitoring: Implement and manage synthetic monitoring solutions to simulate user interactions and monitor the performance and availability of critical endpoints and workflows.
API Management (APIM) Integration: Collaborate with APIM teams to integrate monitoring solutions into APIM platforms, leveraging LogQL and PromQL for analyzing API logs and metrics.
Tempo Tracing Setup: Set up and configure Tempo for distributed tracing, enabling end-to-end visibility into application performance and latency across microservices architectures.
Alerting and Notification: Configure alerting rules and notifications within Grafana to notify relevant teams of potential issues or anomalies in real-time.
Performance Optimization: Identify and implement optimizations to improve the performance, scalability, and efficiency of monitoring systems.
Automation and Scripting: Develop automation scripts and templates for deploying, managing, and scaling monitoring infrastructure and configurations.
Required Skills:
SRE
Any Graduate