Responsibilities include the following:
Serves as the primary contact for all agency cloud operations-related queries and issues.
Provides after-hours support to ensure the seamless operation of agency cloud services.
Creates, modifies, and deletes cloud alerts to monitor system performance.
Monitors application workloads to ensure optimal performance.
Proactively detects problems, manages events, and handles notifications and escalations.
Manages Major Incident Management (MIM) for agency priority 1 incidents, including coordination, documentation, root cause analysis (RCA), and notifications.
Implements automated remediation for recurring incidents.
Updates agencys hosting and design documents as needed.
Manage activity documentation and approval chains for both agency-specific and enterprise activities.
Resolves agency cloud security alerts, ensures compliance with security requirements, and monitors certificate expiration and renewals.
Implements agency security controls according to organizational standards.
Performs agency patch management on cluster environments including Azure Kubernetes Service clusters (AKS) and Kubernetes versions.
Develops and monitors automated advanced sequences.
Conducts file-level restorations as needed .
Identifies cost-saving opportunities, monitors, remediates excessive resource expenditures, and escalates cost-related issues.
Implements billing and cost management tags for better resource allocation.
Maintains IT Service Management Knowledge Base portal for reporting and investigation.
Develops detailed management procedural manuals for each agency.
Requirements:
Must pass an extensive background check.
Ability to work closely with multiple diverse delivery center and its agencies while anticipating their needs and exceeding their expectations.
Consistently demonstrate strong organizational, communication, change management, and problem-solving skills.
Strong knowledge of all cloud technologies and the ability to keep abreast and deep technical understanding of current and emerging technologies.
Proven operational experience with large Enterprise environments.
Excellent communication skills both oral and written to clearly communicate with clients.
Required Skills
System Engineer and or Cloud Engineer with hands on experience dealing with implementation, security, and standards best practices in Azure and AWS. 3 years
In depth knowledge of networking as well as the connectivity to AWS and or Azure (via Direct Connect and or ExpressRoute)
Hands on experience with Microsoft Azure and Amazon Web Services (AWS) cloud services.
Ability to work closely with multiple diverse delivery center and its agencies while anticipating their needs and exceeding their expectations
Strong knowledge of all cloud technologies and the ability to keep abreast and deep technical understanding of current and emerging technologies.
Proven operational experience with large Enterprise environments.
Bachelor's degree in Computer Science