Description

Job Description

The DevOps Engineer is responsible for managing and maintaining the global enterprise platforms, maintaining the templates for the monitoring systems, integration to BigPanda, the quality of the alerting into the BigPanda eventing platform, CMDB integration, monitoring Best Practices for AWS cloud.

10+ years working in Information Technology

4+ years running production systems on AWS

Datadog

Experience with site monitoring and log monitoring tools, specifically Datadog.

Certified AWS SysOps Administrator a plus.

Experience with TypeScript & JavaScript

Experience with DynamoDB, S3 and Cognito

Good understanding of Serverless and CloudFormation

Understanding of best practices for AWS, EC2, Couchbase and monitoring of containers

Manages capacity planning, updates, upgrades and internal integration.

Responsible for administration of monitoring tools in the APM space

Coordinates with DevOps, Problem Management, etc escalations to support monitoring

Manages uptime and availability and reporting

Mentor, educate, and train support personnel on how to use tools

Maintains knowledge on current technology by reading technology periodicals, evaluating new technologies and attending trade-shows, technical seminars and training sessions.

Performs other duties as assigned and required. Duties and responsibilities may change from time to time without notice and include but are not limited to the duties described above

Required Qualifications - Knowledge/Skills

Manage monitoring of overall application availability, latency and system health

Determine alert standards for production environments and implement them

Develop strategies for logging and indexing to improve visibility to development teams

Develop and manage configuration scripts for Amazon hosted infrastructure

Work with the development team and management to ensure high availability

Familiarity with configuration management software such as Ansible

Build reporting dashboards to assist visibility of cost and stability

Provide support to teams for alarms and outages on an as-needed basis

Experience with Windows Server, IIS, Docker/Kubernetes

Strong understanding of systems, networks and troubleshooting techniques.

Experience in automated build pipeline, and continuous integration. Source control, branching, & merging: git/svn/etc (Repository Management)

Communication Skills- The ability to communicate verbally and in writing with all levels of employees and management, speaks and writes clearly and understandably at the right level.

Integrity and Trust- Involves being widely trusted, being seen as a direct, truthful individual, can present the unvarnished truth in an appropriate and helpful manner, keeps confidences, admits mistakes, and doesn't misrepresent him/herself for personal gain.

Teamwork- Works well in a collaborative setting, volunteering for and completing assignments, acting as a positive team member by contributing to discussions, developing and maintaining relations.

Technical Expertise- A commitment to increasing knowledge and skills in current technical/functional area, keeping up to date on technical developments, staying informed as to industry practices, knowing how to apply relevant technical processes to appropriate business needs

Education

Bachelor’s Degree