Description

The job description is as below:

 

About the opportunity

We are seeking a highly motivated Site Reliability Engineer (SRE) with a strong operational focus to join our growing team. In this role, you will play a vital role in ensuring the smooth operation and performance of our critical infrastructure and services. You'll work cross-functionally to create alignment and deliver results alongside builders who have helped to shape the success of companies such as Google, Okta, AWS, Snowflake.

We are looking for someone with experience leading small teams and has a technical leadership mindset as we grow the team. We are building the next generation data security platform for the multi-cloud era - will you join us?

You will:  

  • Deploy software for Cloud Prem and SAAS customers.
  • Respond to and diagnose system incidents in a timely and efficient manner, minimizing downtime and impact on users.
  • Collaborate with other engineers to establish root causes and implement effective resolutions.
  • Continuously improve incident response processes and documentation for future occurrences.
  • Proactively monitor and maintain the health and performance of our infrastructure and services.
  • Perform routine administrative tasks such as system configuration, user management, and data backups.
  • Identify and implement operational improvements to ensure ongoing system reliability and efficiency.
  • Develop and implement scripts and automated solutions to streamline operational tasks and reduce manual workload.
  • Participate in the on-call rotation to address critical incidents outside of regular business hours.
  • Ensure effective handoff between on-call engineers and document post-incident information for future reference.
  • Document processes for support and create, maintain and execute run-books for identified situations

You have:

  • Education:
    • BS degree in Computer Science or related field
  • Experience:
    • 3+ years of experience in Site Reliability Engineering
    • 2+ years experience working with cloud platform and cloud automation tools especially in AWS
    • Strong experience with Kubernetes, Linux, AWS networking(VPC) and Terraform
    • Experience with the GitOps model for deployment
    • Familiarity with distributed version control
  • Other:
    • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana).
    • Bazel and Helm experience a plus
    • Understanding of software configuration best practices
    • Ability to wear multiple hats in a fast-paced environment
    • Hands-on, “can do” attitude and a bias for action
    • Low ego and high intellectual curiosity

Education

Any Graduate