Description

Tech Must Haves ------

  • 5 years as Site Reliability Engineer
  • Python is preferred language.
  • Programming Language: NodeJS, React JS, and Angular is preferred but not required and can be taught.
  • AWS for Cloud Development (EC2, DynamoDB)
  • Familiarity utilizing REST API’s
  • Solid understanding of SRE principles, including proactive monitoring and self-healing system design.
  • Proficient in common DevOps tools (Git/GitHub, Jenkins, Docker).

Overview:
Capital One is seeking a Site Reliability Manager with a strong background in cloud-based solutions (preferably AWS) and a passion for driving automation, self-healing systems, and leveraging Site Reliability Engineering (SRE) principles. You will provide technical leadership to ensure the stability, scalability, and performance of our applications, identifying opportunities for automation and proactive monitoring solutions.

Key Responsibilities:

  • Cloud Expertise: Deep understanding of cloud-based solutions and services, with a focus on AWS (EC2, DynamoDB).
  • Automation & Scripting: Lead automation efforts by implementing scripting, machine learning, and self-healing systems.
  • DevOps Best Practices: Provide technical leadership around DevOps tools (Git/GitHub, Jenkins, Docker) and best practices.
  • Production Support: Ensure systems are highly reliable, with experience in production support and monitoring tools (Splunk, New Relic).
  • Technology Stack: Proficiency in Python, NodeJS (NR Synthetics), ReactJS, Java, and API integration using REST.
  • Monitoring & Alerting: Develop and implement automated monitoring and alerting solutions to minimize manual interventions.
  • Zero-Touch Automation: Identify opportunities to reduce manual validation and promote zero-touch automation and self-healing systems.

Requirements:

  • Experience with AWS cloud services (EC2, DynamoDB).
  • Proficient in common DevOps tools (Git/GitHub, Jenkins, Docker).
  • Strong skills in Python, NodeJS, ReactJS, and Java.
  • Familiarity with production support processes and REST APIs.
  • Solid understanding of SRE principles, including proactive monitoring and self-healing system design.

Education

Any Graduate