Description

Technical Skills

Proficiency in AWS services (CloudWatch, Lambda, EC2, RDS, etc.).

Strong experience with monitoring tools (e.g., Prometheus, Grafana, Datadog).

Knowledge of container orchestration tools (e.g., Kubernetes, ECS).

Proficiency in Java or Scala for backend development.

Scripting and automation skills (Python, Bash, etc.).

Experience with testing frameworks such as Selenium or Cucumber.

Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI).

Experience with incident management tools, specifically PagerDuty

Roles and Responsibilities

  • Develop and maintain monitoring tools and dashboards to ensure 24/7 availability of our services.
  • Design and implement automated solutions for incident detection, response, and resolution.
  • Collaborate with development and operations teams to identify and resolve issues.
  • Integrate with various APIs (Spinnaker, PagerDuty, Git, DataDog, etc) to create platform for continuous testing
  • Build and maintain CI/CD pipelines for operational tools.
  • Create and execute automated tests using frameworks like Selenium or Cucumber to ensure tool quality and performance.
  • Develop backend services and tools using Java or Scala.
  • Integrate and manage incident response processes using PagerDuty.
  • Create documentation and training materials for operational processes and tools

Education

Any Graduate