Description

Key Skills

  • Knowledge of Linux/Unix fundamentals and network concepts.
  • Hands on Shell scripting, interpreted or compiled languages such as bash, zsh, Perl, Python, C/C++, Go, Java
  • Configuration management/Infrastructure as Code - Ansible, Puppet, Terraform/Terragrunt, CloudFormation
  • Basic understanding of containerization technologies such as Docker or Podman and container orchestration technologies like Kubernetes or Apache Mesos.
  • Strong communication and collaboration skills with the ability to work across functional teams.
  • Awareness of key security principles including encryption and keys (types and exchange protocols)
  • Basic understanding of SRE principles including monitoring, alerting, error budgets, fault analysis, and automation.

 

Responsibilities

  • Creating tooling to assist in the implementation, maintenance and support of monitoring, observability, alerting and logging systems to ensure they remain available and highly reliable.
  • Help and participate in the design and implementation of automated processes and tooling like writing Ansible playbooks, writing tooling to monitor different API endpoints.
  • Help in monitoring key performance metrics and proactively identify opportunities for optimization and efficiency gains.
  • Collaborate with cross functional teams to troubleshoot incidents, identify root causes and help implement effective solutions to prevent recurrence.
  • Help with documenting workflows and procedures, and writing and validating run books

Education

Any Graduate