Objectives of this role
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
- Provide primary operational support and engineering for multiple large-scale distributed software applications.
Responsibilities
- integrated monitoring, logging, and observability data to create observability and operations dashboard.
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
- Minimum 7-8 years of experience with Hands on Dynatrace and it's APIs with solid experience in monitoring, analyzing and troubleshooting application issues
- Expert in Dynatrace in setting up/adjusting anomaly thresholds, alerting and integration with multiple notification channels.
- Good experience in using/creating Dynatrace extensions for customized monitoring.
- Partner with development teams to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with well-defined service-level objectives
- Required skills and qualifications.
- Bachelor’s degree (or equivalent) in computer science or related discipline
- Ability to program (structured and OOP) using one or more high-level languages, such as Python, Shell Scripting.
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
Preferred skills and qualifications
- Previous success in technical engineering
- Coding experience beyond simple scripts