Job Description:
1) Development skills on Kube, Java, Spark, Kafka, and others
a) ability to perform code-level changes for minor bug fixes as needed; include code deployment.
2) Create Alerts, Dashboards for Production System / Application health monitoring
a) create Alerts based on application logs, eg. Splunk, per defined SLA
a) create Alerts for Application-level processes health-check, that which run in Kube, and cloud-based env.
b) create Dashboards for monitoring Infra-level performance, include database query performance, API-Call performance (process rate, throughput), and others
3) Domain skills
1) Experience on Production Support, SRE, and/or related
2) Have both Support and Developer mindset
4) Prior production support experience along with excellent communication skills.
5) Should have team leading experience to be able to manage communications across multiple teams and resolve conflicts
Any Gradute