Site Reliability Engineer

My3Tech
Alpharetta, GA, USA

Description

Site Reliability Engineer

Remote Job | 2022-07-12 10:21:46

Apply Now

Share Job

Job Code : 2022-ACS0037

Site Reliability Engineer

100% Remote

12 month Contract

The Site Reliability Engineer would be responsible for building innovation in the areas of distributed system flow and resilience, and continuous feedback and delivery. Creating efficiency and cultural transformation through the curation of new systems and capabilities.

Key responsibilities:

Responsible for monitoring and improving reliability of core systems application and collaboration with infrastructure partners to ensure business continuity on continual basis
Look at data from different tools – Splunk, Dynatrace, SolarWinds – and identify if any action needs to be taken
Build appropriate dashboards for monitoring
Experience in Designing and Deploying multi-data center Large Scale Web Applications.
Work closely with dev, and ops teams to build highly available, cost-effective systems.
Create new tools and scripts designed for auto-remediation of incidents.
Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLAs and get proactive notifications of possible issues for all systems.
Design platforms for extremely high uptime metrics.
Works well independently and requires little or no supervision.
Work with operations team to resolve tickets, developing and running scripts, and troubleshooting.
Fully understand the application, microservices interactions.
Design/Implementation containers/applications in scalable HA/DR multi-tier cloud environments, including new system design, documentation, implementation, and deployment.
Participate in 24x7 an on-call rotation.

Required Skills:

7+ years of experience in the following areas:
Experience with Splunk, Dynatrace, SolarWinds
Experience in providing L4 technical support for production 24x7.
Strong experience in production support and operations.
Design/Implementation of network and presentation tier technologies, including F5, Apache, Nginx, etc
Experience in Performance Testing/Tuning/Monitoring, maximizing system uptime and availability, ensuring functional and performance SLAs.
Experience with monitoring Application/Infrastructure Performance, and availability.
Automation Experience with Build/deployment, Software Configuration/Continuous Integration/Continuous Delivery/Release Engineering related tasks in an JavaEE/C++/ETL Environments.
Experience in automating manual processes using Python, Ruby, Unix Shell (bash, ksh), perl, Ant, etc.
Installing, Configuring, Administering, and Tuning of JavaEE Application Servers/Containers like Tomcat, WebSphere, etc
Installing/maintaining/Administering software on Unix Linux, Windows servers.
Experience with Web service technologies, including REST, SOAP, JSON, XML
Experience with Cloud Platforms and virtualization Technologies.
Deploying and automating infrastructure/applications in cloud environment using Chef, RPM, etc.
Working closely with Development, QA, Product Management, and Production Ops teams to make sure Product Releases on-time with quality.
Hands on experience Configuring and Administering SCM(GIT, SVN), Build (CMake, Make files, Maven), CI(Jenkins), CD Automation Tools.
Experience with database (RDBMS, NoSql) technologies is a plus.
Experience with Performance Testing is a plus.
Configuring and maintaining SDLC Environments.
Experience in Agile Methodologies and processes.
Strong Automation, problem-solving skills, and ability to follow through to completion.
Demonstrated leadership skills through a variety of activities, including leading or mentoring technical staff
Participate in 24x7 an on-call

Key Skills

Splunk Dynatrace SolarWinds F5 Apache Nginx Python Ruby Unix Shell perl Ant

Education

Any Graduate

Back To Jobs

Posted On: 22-Dec-2024
Experience: 5+ years of experience
Availability: Remote
Openings: 2
Category: Site reliability engineering
Tenure: Flexible Position