Description

Top skills

  1. Hands on cluster Node, POD set up, config management
  2. EKS, EMR optimization, design optimization
  3. Monitoring and Observability
  4. AWS Cost Management
  5. Any one CI/CD tool experience

JD:

 

• 10+ years working in Information Technology 
• 4+ years running production systems on AWS 
• Datadog or similar
• Experience with site monitoring and log monitoring tools, specifically Datadog. 
• Certified AWS SysOps Administrator a plus. 
• Experience with TypeScript & JavaScript 
• Experience with DynamoDB, S3 and Cognito 
• Good understanding of Serverless and Teraform

• Good understanding of Serverless and CloudFormation 

• Hands-on Experience in Node, Cluster, AWS Native Service, Design Optimization.
• Understanding of best practices for AWS, EC2, EKS, EMR, Couchbase and monitoring of containers 

 

• Manages capacity planning, updates, upgrades and internal integration. 
• Responsible for administration of monitoring tools in the APM space 
• Coordinates with DevOps, Problem Management, etc escalations to support monitoring 
• Manages uptime and availability and reporting 
• Mentor, educate, and train support personnel on how to use tools 
• Maintains knowledge on current technology by reading technology periodicals, evaluating new technologies and attending trade-shows, technical seminars and training sessions. 
• Performs other duties as assigned and required. Duties and responsibilities may change from time to time without notice and include but are not limited to the duties described above 

REQUIRED QUALIFICATIONS - KNOWLEDGE/SKILLS 
• Manage monitoring of overall application availability, latency and system health 
• Determine alert standards for production environments and implement them 
• Develop strategies for logging and indexing to improve visibility to development teams 
• Develop and manage configuration scripts for Amazon hosted infrastructure 
• Work with the development team and management to ensure high availability 
• Familiarity with configuration management software such as Ansible 
• Build reporting dashboards to assist visibility of cost and stability 
• Provide support to teams for alarms and outages on an as-needed basis 
• Experience with Windows Server, IIS, Docker/Kubernetes 
• Strong understanding of systems, networks and troubleshooting techniques. 
• Experience in automated build pipeline, and continuous integration. Source control, branching, & merging: git/svn/etc (Repository Management) 
• Communication Skills- The ability to communicate verbally and in writing with all levels of employees and management, speaks and writes clearly and understandably at the right level. 
• Integrity and Trust- Involves being widely trusted, being seen as a direct, truthful individual, can present the unvarnished truth in an appropriate and helpful manner, keeps confidences, admits mistakes, and doesn’t misrepresent him/herself for personal gain. 
• Teamwork- Works well in a collaborative setting, volunteering for and completing assignments, acting as a positive team member by contributing to discussions, developing and maintaining relations. 
• Technical Expertise- A commitment to increasing knowledge and skills in current technical/functional areas, keeping up to date on technical developments, staying informed as to industry practices, and knowing how to apply relevant technical processes to appropriate business needs.