Senior Site Reliability Engineer

InnoMethods Corporation
Jersey City, NJ, USA

Description

Job Description:

Ability to work in a blameless, data driven fashion and manage incident response.

Build monitoring solutions including dashboards, alerting using industry standard tools such as Dynatrace, Grafana, and cloud native tools (Azure AWS)

Code and support infrastructure automation across the CI/CD pipeline using Python, Ansible, and Terrafrom

Demonstrate strong programming skills and thorough knowledge of systems, especially Azure, Databricks, OpenAI, and AWS

Enhance reliability through designing, building, and maintaining scalable core infrastructure.

Familiarity with Agile and/or Agile SaFE processes

Improve operational processes and team practices.

Intimate knowledge of SRE principles practices including SLIs, SLOs, Toil Reduction, Observability, Automation

On-call rotation for incident response and proactive incident measures

Strong analytical and documentation skills

Strong communication skills and an ability to collaborate.

Strong problem-solving skills and ability to think under pressure.

SRE, DataBricks, Purview, ¿Terraform, Python, Observability/Infrastructure/Config as Code

Dynatrace, Grafana, Ansible, Jenkins, Bitbucket, AW

Key Skills

Python Azure AWS Databricks OpenAI Terraform Dynatrace Grafana Ansible Jenkins Bitbucket

Education

Any Graduate

Back To Jobs

Posted On: 29-Sep-2024
Experience: 10+ years of experience
Openings: 1
Category: Senior Site Reliability Engineer
Tenure: Flexible Position