Job Description:
Dynamic Engineer who has an understanding of application performance management, experience building monitoring and alerting solutions.
- Troubleshoot incidents, identify root cause , fix and document problems and deploy preventative solutions.
Required Experience
- 5+ years of recent experience working on building automation and monitoring for observability (Prometheus/Grafana/ELK).
- 5 + years of experience working on support projects and be on rotational on-call to address failures.
- 5+ years of recent experience with Kubernetes, Docker, Helm and end to end support of applications in this environment.
- 5+ years of recent experience working in AWS and/or GCP.
- 3+ years of full stack python development.
- Great communication skills to be able to effectively communicate with team members as well as management.
Skills Preferred:
- MLOps experience
- MLE experience