Focused and attentive to business-critical issues.
Highly proactive and takes initiative to identify problem areas to evolve solutions.
Excellent analytical and problem-solving skills.
Responsible for Incident management for defined IT applications ensuring 100% uptime of systems and Applications.
Perform daily system monitoring (through monitoring tool and script-based notifications). verifying the integrity and availability of all hardware, server resources and key processes. Reviewing system and application logs and verifying completion of scheduled jobs such as application start-up processes and backups.
Manage and Administer applications running under LINUX /AIX/Solaris/Windows based and Cloud native systems, including configuration, troubleshooting, and automation.
Identifying the root causes of incidents to prevent future recurrence.
Act as the main point of contact for coordinating, resolving, and discussing application, interface, and integration problems with vendors and other teams.
Perform ongoing application performance tuning, application upgrades, and resource optimization as required.
Responsible for deployment & movement of application code / bug fixes / data patches and release management in Production, SIT and UAT environments in coordination with Development Team and vendors.
Manage Execution of OS and DB patching for the defined IT applications.
Conduct regular bug/issue tracker review meetings & follow-ups with respective teams for quick problem resolution.
Assist team in achieving best possible design solution for functional requirements.
Attending Change Control Board meetings & planning the production movement.
Preparing Application process documents/ KB documents. Periodic review of KB documents & publishing the same to the L1 Team.
Collaborate with development teams to ensure smooth integration and deployment of applications in a cloud-native environment.
Collaborate with security teams to implement and maintain secure cloud environments.
Must be ready to work in 24*7 support environment.
Requirements:
Prior L1 & L2 Production support experience.
Atleast 2-3 years of relevant experience in LINUX / UNIX / Windows platforms, preferably with BFSI segment; in a high-volume or critical production application environment.
Hands-on experience in installation, configuration, monitoring and management of Applications & Databases.
Strong Unix/Linux skills – well versed in tasks like file editing, system resource monitoring, running, and scheduling processes, and troubleshooting system issues.
Hands-on experience in RDBMS – Oracle / MySQL/ MSSQL and SQL Query.
Exposure to Web-App servers (WebLogic, Web Sphere, JBoss, Apache Tomcat), IIS.
Hands-on experience in writing and executing SQL queries with JOINS.
Hand-on Experience in any Scripting languages such as Shell, Python, JAVA etc is Plus.
Good exposure to Alerting and Monitoring Tools.
Exposure to scheduling products, such as Autosys or crontab.
Understanding of various cloud platforms, such as AWS, GCP, and Azure, as well as expertise in service mesh, Kubernetes, networking, and infrastructure automation.
Deploy, configure, and maintain Kubernetes clusters (GKE, EKS, AKS) using infrastructure-as-code tools like Helm charts, Ansible, CloudFormation, and Terraform.
Support database technologies such as Cloud SQL, Amazon RDS, Oracle, and PostgreSQL.
In-depth knowledge of networking concepts, including CIDR, load balancing, VPCs, and transit gateways.
Experience in application server log analysis, troubleshooting, and problem solving.
Exposure to Incident, Problem, Capacity, Change & Release Management Process.
Working knowledge of Hardware and Networking Technologies such as Load Balancer, DNS Firewalls etc.
Hands-on experience of API and Microservices.
Exposure to maintaining and operating CI/CD pipeline.