The AWS Cloud Lead is responsible for leading and coordinating cloud-related activities, incident management, problem management, change management, monitoring, infrastructure deployment, vulnerability management, service requests, general cloud service maintenance, cost optimization, and support for target state architecture adhering to the ITSM process.
Key Responsibilities:
1. Incident Management:
- Address task issues related to slow processing or non-processing.
- Troubleshoot issues after container image updates.
- Coordinate with the application team to troubleshoot networking and web UI issues.
2. Problem Management:
- Conduct RCA and implement fixes for recurring job failures.
- Perform RCA and tuning for performance issues.
3. Change Management:
- Plan and execute periodic OS patching for EC2 hosts.
- Manage periodic maintenance for metadata databases, including patching and upgrades.
- Secure, encrypt, and manage container images for rapid deployment.
- Coordinate domain certificate keys rotation.
4. Monitoring:
- Set up CloudWatch alerts and log monitoring.
- Configure CloudTrail for unified logging.
- Monitor Prometheus graphs for performance and health metrics.
5. Infrastructure Deployment:
- Maintain and enhance Infra Code using Terraform scripts.
- Develop and maintain CI/CD pipeline using Jenkins for continuous deployment.
6. Vulnerability Management:
- Generate periodic vulnerability scan reports for containers using Wiz tool.
- Mitigate any open vulnerabilities identified.
7. Service Requests:
- Maintain pipelines for Route53 DNS registration.
- Modify and maintain Terraform infrastructure code as per requirements.
- Deploy container images containing new features as per SDLC process using CI/CD Jenkins pipelines.
- Establish peer connections with new nodes and third-party networks.
- Deploy and maintain container images using Helm chart.
- Handle IAM key rotation, pool creation requests, connection requests, and resource upgrades to support EKS cluster nodes.
- Manage container logs and filesystems.
- Create and modify custom roles.
- Maintain VPC networking.
8. General Cloud Service Maintenance:
- Update and modify cloud services based on recommendations from central teams.
9. Cost Optimization:
- Review cloud cost usage, identify cost-saving opportunities, and implement optimizations.
10. Support Target State Architecture Review:
- Create standard deployment architecture diagrams.
- Document architecture and follow required processes for submission.
11. Process:
Adhere to the ITSM process for incident, change, and problem management.
Qualifications:
1. Bachelor's degree in Computer Science, Information Technology, or related field.
2. Extensive experience in AWS cloud architecture, design, and implementation.
3. Proficiency in AWS services, infrastructure, and deployment tools.
4. Strong knowledge of infrastructure as code (IaC) tools, such as Terraform.
5. Hands-on experience in CI/CD pipeline setup and management using tools like Jenkins.
6. Familiarity with monitoring and logging tools like CloudWatch, CloudTrail, and Prometheus.
7. Experience in vulnerability management and security best practices in an AWS environment.
8. Strong understanding of the ITSM process and best practices.
9. Excellent problem-solving and communication skills.
Bachelor's degree in Computer Science, Information Technology, or related field.