Description Role and Responsibilities:
• Deploy, Maintain, Enhance and Monitor a highly scalable infrastructure for data processing platform using Kubernetes
• Using AWS Cloud and open-source services to address critical business needs
• Ensure the 24/7 availability of the system, with proper alerting and monitoring
• Identify and fix bugs and performance issues in the platform.
• Work with agile teams on setting error budgets, root cause analysis exercises, and blameless post-mortems
• Utilize continuous delivery (CI/CD) with Gitlab CI, Jenkins, ArgoCD, Artifactory, Docker
• Data pipeline and application monitoring and failure recovery
• Setup and monitor application access and connectivity
• Advocate for a DevOps culture of automation, self-service, and engineering best practices to enable development teams
• Autoscaling and monitoring performance for Kubernetes and running applications using Prometheus and Grafana or similar tools
• Performing all SRE activities such as availability and reliability monitoring and reports
• Tune, Monitor and configure tools such as Kaaa, Spark, Presto, Airflow, MQTT
• Use infrastructure as a service with Terraform
• Operate and maintain code repository with GitLab.
Required Qualification:
• Bachelor’s degree in Computer Science OR Computer Engineer
• Minimum 5+ years of experience in DevOps engineering or software development.
• Strong coding and scripting experience with Bash, Python, Go or similar languages.
• Comprehensive experience with AWS including a solid understanding of CI/CD, Amazon S3, EC2, IAM, CloudFormation and Route 53
• Experience with user access, authentication, user permission management and security, LDAP, AD, OIDC, Kerberos
• Experience with secure infrastructure networking with AWS using different types of Load Balancers, setting up VPCs, subnets, and routing tables
• Experience with auto scaling, performance testing and capacity planning.
• Experience with tools such as Jenkins, Artifactory, etc. to build automation, CI/CD, Self- Service pipelines.
• Experience owning infrastructure in production, as well as designing and creating build/deploy & monitoring systems using CloudFormation/Terraform
• Experience with restful services, pub/sub communication model, service-oriented architecture, distributed systems, cloud system (AWS) and micro-services architecture platform.
Requirements Preferred Qualifications:
• Master’s degree in Computer Science OR Computer Engineer
• Experience with configuration management tools kit Puppet, Chef, Kustomize, or Ansible
• Experience with containerization and scheduling, with Docker and Kubernetes.
• Strong distributed systems implementation experience
• Experience with AWS Direct Connect or setting up and maintaining a hybrid cloud
• Experience with optimizing storage classes, lifecycle rules, instance classes, and throughput tuning to optimize for cost without sacrificing performance
• Experience in backend services deployment and management
Bachelor’s Degree