IT Site reliabilility Engineer

RulesIQ
Indianapolis, IN, USA

Description

Key Responsibilities:

· Infrastructure Management:

o Design, implement, and manage scalable, reliable, and secure infrastructure using cloud services (e.g., AWS, Azure, GCP) and on-premises solutions.

o Automate infrastructure provisioning, monitoring, and maintenance using tools like Terraform, Ansible, or Puppet.

· Monitoring and Incident Response:

o Develop and maintain monitoring, logging, and alerting systems to detect and respond to issues proactively.

o Lead incident response efforts, perform root cause analysis, and implement corrective actions to prevent recurrence.

· Performance Optimization:

o Continuously monitor system performance and optimize infrastructure to meet service level objectives (SLOs) and service level agreements (SLAs).

o Collaborate with development teams to identify and resolve performance bottlenecks.

· Reliability Engineering:

o Implement and advocate for best practices in reliability engineering, including chaos engineering, fault injection, and resilience testing.

o Design and implement disaster recovery and business continuity plans.

· Collaboration and Communication:

o Work closely with cross-functional teams to ensure seamless deployment and integration of new features and updates.

o Communicate effectively with stakeholders, providing updates on system status, performance metrics, and improvement initiatives.

· Compliance and Security:

o Ensure all infrastructure and operations comply with healthcare industry regulations (e.g., HIPAA, HITECH) and security best practices.

o Conduct regular security assessments and audits to identify and mitigate risks.

Required Skills

· Education:

· Bachelor’s degree in Computer Science, Information Technology, or a related field. Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer) are a plus.

· Experience:

· 3-5+ years of experience in site reliability engineering, DevOps, or a related role.

· Proven experience in managing cloud infrastructure and services.

· Strong background in scripting and automation (e.g., Python, Bash, Shell).

· Technical Skills:

· Proficiency with infrastructure as code (IaC) tools such as Terraform, Ansible, or Puppet.

· Experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, or Splunk.

· Solid understanding of networking, security principles, and compliance requirements in the healthcare industry

Key Skills

Python Bash Shell Terraform Ansible Aws Azure Gcp Splunk

Education

Bachelor's Degree

Back To Jobs

Posted On: 23-Dec-2024
Experience: 5+ years of experience
Openings: 1
Category: Site Reliability Engineer
Tenure: Contract - Corp-to-Corp Position