Description

Job Description:

 

Your role will focus on the development of the platform core and common platform services. You’ll solve problems related to complex cloud-infrastructure automation, multi-region networking, authentication/authorization, logging/metrics collection at scale. To provide tooling and frameworks for engineering teams for transaction tracing, performance analysis, business monitoring and alerting.

 

  • Lead/contribute to engineering efforts from design to implementation, solving complex technical challenges around monitoring distributed systems at scale.
  • Drive the roadmap for the Observability platforms in conjunction with cross-functional partners. Bring together multiple perspectives and be the key connector in this important and highly visible role
  • Build, lead and mentor an Observability team; create an environment of teamwork, trust, and mutual success
  • Participate in deep technical design discussions within your team, across partner teams, and ensure that we're building the right systems
  • Drive adoption of best practices in monitoring, alerting, and performance.
  • Work closely with development teams to implement monitoring & observability instrumentation within their platforms.
  • Participate in an 24/7 oncall rotation for Monitoring & Observability services.
  • Containerization & Container Orchestration (i.e. Docker, Kubernetes)
  • Cloud Infrastructure Automation (Azure strongly preferred)

 

Qualifications

  • Bachelor’s in computer science, related field, or equivalent work experience
  • Good working experience from Azure cloud
  • Previous experience delivering Observability at scale is required.
  • Working knowledge of Kubernetes
  • Distributed Systems Development (e.g. asynchronous communication patterns, consensus algorithms, distributed transactions)
  • Services Programming (e.g. Go-lang, Java, Kotlin, Scala, Clojure, Python, Ruby)
  • Experience working with Linux systems
  • Experience with monitoring and alerting systems
  • Experience designing and building reliable systems at scale
  • Experience with distributed tracing systems. Jaegar / Open Zipkin
  • Strong interpersonal and collaborative skills
  • Tool (e.g. Terraform, CDK, Pulumi, CloudFormation)


 

Education

Bachelor's Degree