Knowledge of Linux/Unix fundamentals and network concepts.
Hands on Shell scripting, interpreted or compiled languages such as bash, zsh, Perl, Python, C/C++, Go, Java
Configuration management/Infrastructure as Code - Ansible, Puppet, Terraform/Terragrunt, CloudFormation
Basic understanding of containerization technologies such as Docker or Podman and container orchestration technologies like Kubernetes or Apache Mesos.
Strong communication and collaboration skills with the ability to work across functional teams.
Awareness of key security principles including encryption and keys (types and exchange protocols)
Basic understanding of SRE principles including monitoring, alerting, error budgets, fault analysis, and automation.
Responsibilities
Creating tooling to assist in the implementation, maintenance and support of monitoring, observability, alerting and logging systems to ensure they remain available and highly reliable.
Help and participate in the design and implementation of automated processes and tooling like writing Ansible playbooks, writing tooling to monitor different API endpoints.
Help in monitoring key performance metrics and proactively identify opportunities for optimization and efficiency gains.
Collaborate with cross functional teams to troubleshoot incidents, identify root causes and help implement effective solutions to prevent recurrence.
Help with documenting workflows and procedures, and writing and validating run books