Description

Key Responsibilities:

·        Infrastructure Management:

o  Design, implement, and manage scalable, reliable, and secure infrastructure using cloud services (e.g., AWS, Azure, GCP) and on-premises solutions.

o  Automate infrastructure provisioning, monitoring, and maintenance using tools like Terraform, Ansible, or Puppet.

·        Monitoring and Incident Response:

o  Develop and maintain monitoring, logging, and alerting systems to detect and respond to issues proactively.

o  Lead incident response efforts, perform root cause analysis, and implement corrective actions to prevent recurrence.

·        Performance Optimization:

o  Continuously monitor system performance and optimize infrastructure to meet service level objectives (SLOs) and service level agreements (SLAs).

o  Collaborate with development teams to identify and resolve performance bottlenecks.

·        Reliability Engineering:

o  Implement and advocate for best practices in reliability engineering, including chaos engineering, fault injection, and resilience testing.

o  Design and implement disaster recovery and business continuity plans.

·        Collaboration and Communication:

o  Work closely with cross-functional teams to ensure seamless deployment and integration of new features and updates.

o  Communicate effectively with stakeholders, providing updates on system status, performance metrics, and improvement initiatives.

·        Compliance and Security:

o  Ensure all infrastructure and operations comply with healthcare industry regulations (e.g., HIPAA, HITECH) and security best practices.

o  Conduct regular security assessments and audits to identify and mitigate risks.

 

Required Skills

·        Education:

·  Bachelor’s degree in Computer Science, Information Technology, or a related field. Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer) are a plus.

·        Experience:

·  3-5+ years of experience in site reliability engineering, DevOps, or a related role.

·  Proven experience in managing cloud infrastructure and services.

·  Strong background in scripting and automation (e.g., Python, Bash, Shell).

·        Technical Skills:

· Proficiency with infrastructure as code (IaC) tools such as Terraform, Ansible, or Puppet.

· Experience with monitoring and logging tools like Prometheus, Grafana, ELK Stack, or Splunk.

· Solid understanding of networking, security principles, and compliance requirements in the healthcare industry

Education

Bachelor's Degree