Duties and Responsibilities: The Monitoring and Observability engineer will be responsible for Designing, configuring, monitoring, implementing, and maintaining our observability solutions and troubleshooting IT systems and applications to ensure optimal performance and reliability. You will work closely with cross-functional teams to identify potential issues and provide innovative insights to optimize system performance, stability, and availability. The engineer will also be responsible for automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
Mandatory Skills:
- 3+ years of experience working in the observability, operations, or DevOps domains.
- Proficient in Observability, monitoring, and logging tools Like Dynatrace, SolarWinds
- Candidate should have done installation, setting up and configuration on monitoring tools - Like Dynatrace, SolarWinds.
The responsibilities of Integrated Operations, Engineer II include the following:
- Configure and maintain monitoring and observability tools and systems. – Solarwinds & Dynatrace
- Monitor Server, network infrastructure and application performance metrics, and identify patterns and trends to improve system performance and reliability.
- Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
- Automate alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
- Conduct capacity planning and forecasting to ensure scalability and optimal performance of IT systems and applications.
- Collaborate with cross-functional teams to support incident management, change management, and problem management processes.
Skills required -
- Deep understanding of IT infrastructure monitoring and observability best practices.
- Strong analytical skills, with the ability to analyze large amounts of data and identify patterns and trends.
- Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex issues.
- Programming skills in languages such Perl, Shell, or JavaScript.
- Experience with automation tools such as Ansible, Puppet or Terraform.
- Experience with container orchestration tools like Kubernetes.
- Experience with cloud platforms such as AWS, GCP, or Azure.
- Experience with CI/CD tools like Jenkins.