Job Description:
Your role will focus on the development of the platform core and common platform services. You’ll solve problems related to complex cloud-infrastructure automation, multi-region networking, authentication/authorization, logging/metrics collection at scale. To provide tooling and frameworks for engineering teams for transaction tracing, performance analysis, business monitoring and alerting.
- Lead/contribute to engineering efforts from design to implementation, solving complex technical challenges around monitoring distributed systems at scale.
- Drive the roadmap for the Observability platforms in conjunction with cross-functional partners. Bring together multiple perspectives and be the key connector in this important and highly visible role
- Build, lead and mentor an Observability team; create an environment of teamwork, trust, and mutual success
- Participate in deep technical design discussions within your team, across partner teams, and ensure that we're building the right systems
- Drive adoption of best practices in monitoring, alerting, and performance.
- Work closely with development teams to implement monitoring & observability instrumentation within their platforms.
- Participate in an 24/7 oncall rotation for Monitoring & Observability services.
- Containerization & Container Orchestration (i.e. Docker, Kubernetes)
- Cloud Infrastructure Automation (Azure strongly preferred)
Qualifications
- Bachelor’s in computer science, related field, or equivalent work experience
- Good working experience from Azure cloud
- Previous experience delivering Observability at scale is required.
- Working knowledge of Kubernetes
- Distributed Systems Development (e.g. asynchronous communication patterns, consensus algorithms, distributed transactions)
- Services Programming (e.g. Go-lang, Java, Kotlin, Scala, Clojure, Python, Ruby)
- Experience working with Linux systems
- Experience with monitoring and alerting systems
- Experience designing and building reliable systems at scale
- Experience with distributed tracing systems. Jaegar / Open Zipkin
- Strong interpersonal and collaborative skills
- Tool (e.g. Terraform, CDK, Pulumi, CloudFormation)