Description

Title:-    SRE /Datadog SME

Location:- Dallas TX 

Long Term Contract 

 

RESPONSIBILITIES 

•             Establish monitoring, tracing, logging, and alerting for shared platforms 

•             Define SLAs and SLOs and set up monitoring to ensure availability targets are being met 

•             Develop tools and workflows utilizing engineering best practices, such as infrastructure as code and CI/CD, to promote reliability and availability 

•             Collaborate with platform engineers and developers to improve operational stability and reliability 

  •         He should have implemented Data Dog from scratch and supporting the monitoring for organization.

 

REQUIREMENTS 

  • Python + Application monitoring
  • Data Dog SME
  • Minimum 5 years in Data Dog
  • Data Dog should be his primary skill
  • Bachelor's degree in computer science or related or equivalent experience 
  • Proven work experience as a Site Reliability Engineer or in a similar role 
  • Expert in infrastructure as code (Terraform, Docker, Helm) 
  • Expert in monitoring tools such as Data Dog or Dynatrace 
  • Cloud experience, preferably Azure 
  • Experience with container technologies - Docker and Kubernetes 
  • Experience with configuration and administration of CI/CD pipelines, preferably using GitHub Actions 
  • Capable of writing comprehensive technical documentation and diagrams 
  • Working knowledge of bash and shell scripting 
  • Understanding of end-to-end application development lifecycle from code commit to production deployment  
  • Have DevOps, Reliability, and Security mindsets - understand production controls and change processes