Job Description
The DevOps Engineer is responsible for managing and maintaining the global enterprise platforms, maintaining the templates for the monitoring systems, integration to BigPanda, the quality of the alerting into the BigPanda eventing platform, CMDB integration, monitoring Best Practices for AWS cloud.
- 10+ years working in Information Technology
- 4+ years running production systems on AWS
- Datadog
- Experience with site monitoring and log monitoring tools, specifically Datadog.
- Certified AWS SysOps Administrator a plus.
- Experience with TypeScript & JavaScript
- Experience with DynamoDB, S3 and Cognito
- Good understanding of Serverless and CloudFormation
- Understanding of best practices for AWS, EC2, Couchbase and monitoring of containers
- Manages capacity planning, updates, upgrades and internal integration.
- Responsible for administration of monitoring tools in the APM space
- Coordinates with DevOps, Problem Management, etc escalations to support monitoring
- Manages uptime and availability and reporting
- Mentor, educate, and train support personnel on how to use tools
- Maintains knowledge on current technology by reading technology periodicals, evaluating new technologies and attending trade-shows, technical seminars and training sessions.
- Performs other duties as assigned and required. Duties and responsibilities may change from time to time without notice and include but are not limited to the duties described above
Required Qualifications - Knowledge/Skills
- Bachelor's degree required
- Any certification related to AWS a plus
- Manage monitoring of overall application availability, latency and system health
- Determine alert standards for production environments and implement them
- Develop strategies for logging and indexing to improve visibility to development teams
- Develop and manage configuration scripts for Amazon hosted infrastructure
- Work with the development team and management to ensure high availability
- Familiarity with configuration management software such as Ansible
- Build reporting dashboards to assist visibility of cost and stability
- Provide support to teams for alarms and outages on an as-needed basis
- Experience with Windows Server, IIS, Docker/Kubernetes
- Strong understanding of systems, networks and troubleshooting techniques.
- Experience in automated build pipeline, and continuous integration. Source control, branching, & merging: git/svn/etc (Repository Management)
- Communication Skills- The ability to communicate verbally and in writing with all levels of employees and management, speaks and writes clearly and understandably at the right level.
- Integrity and Trust- Involves being widely trusted, being seen as a direct, truthful individual, can present the unvarnished truth in an appropriate and helpful manner, keeps confidences, admits mistakes, and doesn't misrepresent him/herself for personal gain.
- Teamwork- Works well in a collaborative setting, volunteering for and completing assignments, acting as a positive team member by contributing to discussions, developing and maintaining relations.
- Technical Expertise- A commitment to increasing knowledge and skills in current technical/functional area, keeping up to date on technical developments, staying informed as to industry practices, knowing how to apply relevant technical processes to appropriate business needs.