Job Title: Cloud Operations Engineer
Location: Fort worth, TX (Hybrid)
Duration: Contract (1 year +)
Key Responsibilities:
- Incident and System Management: Collaborate with internal teams and suppliers to analyze and resolve critical IT and Telecom service interruptions, and protect system availability through incident, problem, and change management.
- System Monitoring and Optimization: Monitor systems for faults, identify optimization opportunities, and implement tools and process changes to improve monitoring and alerting.
- Incident Response and Root Cause Analysis: Work with major incident response teams for escalations and monitoring during major incidents
Qualifications & Experience:
- Bachelor’s degree in computer science, Information Systems, or Engineering preferred.
- a solid understanding of cloud architecture and DevOps principles.
- Strong exp in Event monitoring and alerting, DevOps, Infrastructure Support, or IT Major Incident Management
- Experience with monitoring tools (Dynatrace, CloudWatch, Zabbix, SCOM).
- DevOps application performance tuning.
- Strong writing skills for documentation.
- Proficient in distributed systems/administration (Windows, Unix, Linux, VMWare, etc.).
- Knowledge of ITIL best practices (certification is a plus).
- Familiarity with SDLC lifecycle.
- Experience in SLA/KPI-driven environments.
- ServiceNow proficiency.
- General scripting/programming skills (Python, Node.js, Ruby, Perl, Bash/sh)
- Availability: Able to work in a 24/7 environment and provide on-call support.