Job Description
As a Site Reliability Engineer, you’ll play a critical role in designing scalable and reliable systems that meet the operational requirements of our organization.
- Leverage your understanding of our system to assist in resolving production issues in real time
- Bring your knowledge and perspective on reliability to preemptive (design reviews) and corrective (post mortems) discussions
- Lead service reliability reviews and audits and present findings to stakeholders
- Find patterns and pain points that hinder Wix’s availability, and produce large-scale solutions
- Generate and prioritize tasks for infrastructure teams to aid in improving uptime and reducing blast radius
- Analyze system performance and scalability requirements, identify bottlenecks, and then propose and implement solutions to optimize system capacity
Qualifications:
- A Senior R&D employee with 5+ years of experience managing large engineering projects
- You’re experienced with monitoring, logging, and tracing mechanisms
- You have an excellent understanding of how web applications work - from browsers and caches to the database, and back
- You’re skilled in site reliability engineering principles, including scalability, availability, performance, and fault tolerance
- You're highly motivated by the idea of automating failure remediation processes
- You’re great at jumping between multiple tasks and you know how to analyze risk and prioritize accordingly
- You enjoy critical thinking and problem solving and are seasoned in conflict resolution
- 3+ years experience in coding and/or running production systems over the cloud - a significant advantage