Description

Job Description

The DevOps Engineer is responsible for managing and maintaining the global enterprise platforms, maintaining the templates for the monitoring systems, integration to BigPanda, the quality of the alerting into the BigPanda eventing platform, CMDB integration, monitoring Best Practices for AWS cloud.
10+ years working in Information Technology
4+ years running production systems on AWS
Datadog
Experience with site monitoring and log monitoring tools, specifically Datadog.
Certified AWS SysOps Administrator a plus.
Experience with TypeScript & JavaScript
Experience with DynamoDB, S3 and Cognito
Good understanding of Serverless and CloudFormation
Understanding of best practices for AWS, EC2, Couchbase and monitoring of containers
Manages capacity planning, updates, upgrades and internal integration.
Responsible for administration of monitoring tools in the APM space
Coordinates with DevOps, Problem Management, etc escalations to support monitoring
Manages uptime and availability and reporting
Mentor, educate, and train support personnel on how to use tools
Maintains knowledge on current technology by reading technology periodicals, evaluating new technologies and attending trade-shows, technical seminars and training sessions.
Performs other duties as assigned and required. Duties and responsibilities may change from time to time without notice and include but are not limited to the duties described above

Required Qualifications - Knowledge/Skills

Manage monitoring of overall application availability, latency and system health
Determine alert standards for production environments and implement them
Develop strategies for logging and indexing to improve visibility to development teams
Develop and manage configuration scripts for Amazon hosted infrastructure
Work with the development team and management to ensure high availability
Familiarity with configuration management software such as Ansible
Build reporting dashboards to assist visibility of cost and stability
Provide support to teams for alarms and outages on an as-needed basis
Experience with Windows Server, IIS, Docker/Kubernetes
Strong understanding of systems, networks and troubleshooting techniques.
Experience in automated build pipeline, and continuous integration. Source control, branching, & merging: git/svn/etc (Repository Management)
Communication Skills- The ability to communicate verbally and in writing with all levels of employees and management, speaks and writes clearly and understandably at the right level.
Integrity and Trust- Involves being widely trusted, being seen as a direct, truthful individual, can present the unvarnished truth in an appropriate and helpful manner, keeps confidences, admits mistakes, and doesnt misrepresent him/herself for personal gain.

Education

Any Graduate