Description

What you’ll do & how you’ll make your mark.

Architect and maintain mission critical global hybrid infrastructure spanning multiple datacenters & cloud providers, leveraging primarily open source technologies.
Design next generation scalable systems which are highly available, resilient and capable of handling high volume Internet facing web traffic.
Be responsible for downtimes and maintain the product SLA, capacity planning of the systems and overall health & performance of large scale production systems.
Participate in weekly 24/7 oncall rotation, solving escalated tickets, resolve outages and debug production issues.
Work closely with various stakeholders like Engineering, Monitoring and Operations teams, Noc / Soc, customers & business development teams.
Challenge the status quo. Empower development teams by transitioning legacy methodologies, platform & technologies to devops principles, cloud native technologies and newer ecosystems without much friction.
Strict adherence to automating routine tasks and scripting, with a low tolerance to manual processes.
Needs to be data & metric driven. Develop tools and platforms for better system observability & insights.
Writing design decision documentation and is keen on implementing overall production best practices with a strong focus on security & encourage right Devops Workflows.


Who you are & what you’ll need to succeed.

Excellent knowledge of Linux internals & OS fundamentals like scheduler, memory, storage, networking, etc. Has managed production servers running on RHEL/CentOS/ Ubuntu Distributions.
Needs to be good in understanding Linux Filesystems, Linux troubleshooting spanning networks and systems. Sound knowledge in shell / command line, OSI, TCP/IP & networking fundamentals is mandatory.
Exposure to RDBMS like MySQL, PostgreSQL etc.
Exposure to at least 1 configuration management tools like Puppet, Ansible, Chef etc & understanding of GIT concepts / terminologies.
Can code in Python to write scripts and automate routine tasks.
Public cloud and Kubernetes experience .
A Generalist who has the knowledge of the aforementioned and below mentioned skills. Someone who understands from DNS-to-Deployments and everything in between.
Has managed in past large scale web infrastructure with deep understanding of L4/L7 Load balancing, high availability & DNS. Has worked on Haproxy, Nginx, Heartbeat/KeepAlived, pacemaker etc. Prior experience of managing DNS and large scale Email system is a bonus.
Has prior Systems administration & troubleshooting experience and exposure to high traffic production environments dealing primarily in web application stacks on Apache / Nginx / Tomcat etc.
Sound knowledge on various RDBMS and NoSQL Databases like Mysql / PostgreSQL, Redis, Cassandra etc. Exposure to Database clustering solutions is a plus.
Deploying new, maintaining, patching and upgrading systems at scale with automation tools like Rundeck etc.
Exposure to metrics & logging stacks like Ganglia, TICK. Grafana/Influx/ Graphite,, Prometheus, ELK, Fluentd, Splunk, Graylag etc.
Understands the basic principles of virtualization and containerization and working knowledge of Docker, KVM/Libvirt. Exposure to infrastructure orchestration platforms like Kubernetes, Openshift, OpenStack, Mesos is a bonus.
Production experience to deploying in AWS and proficient in IAC toolchains like Terraform, CloudFormation etc will be a bonus.
Experience in managing CI/CD pipelines using tools like Jenkins, Bamboo, etc
Proficient in atleast one scripting/programming language like Python, Ruby, Golang, Perl,Powershell etc.
Understands the importance of basic system, application & network security and exposure to benchmarks like CIS, NIST and OpenSCAP is a bonus.

 

Education

ANY GRADUATE