Hadoop/Kafka SRE

Avance Consulting
Warsaw,Poland

Description

Carry out SRE duties for Big Data on various open-source platforms such as Hadoop, Spark, and HBASE.

• Keep an eye on the platforms and adhere to runbooks/SOPs to manage platform and application problems.

• Familiarize yourself with the cluster maintenance processes and implement changes as per the documented installation and validation plans.

• Showcase robust troubleshooting and debugging skills, aiming to pinpoint and rectify the issue, while also offering advice on how to prevent such problems in the future.

• Conduct thorough root cause analysis of major production incidents, document for future reference, and put in place proactive measures to enhance system reliability.

• Automate routine tasks using scripts or automation tools to lessen manual work, decrease the chance of human errors, and boost system reliability.

• Technical Skills required:

o At least 2-3 years of experience for a junior level role and 5+ for mid-level/senior level working as a Hadoop Site reliability engineer.

o High level Knowledge on Hadoop platforms and core Hadoop components.

o Troubleshooting both Hadoop platform service, application problems and identifying the root cause.

o Writing ansible playbooks and automate manual tasks using Ansible, shell scripting and python scripting.

o Should be familiar with Unix/Linux system internals, networking, and distributed systems.

Job description for Kafka SRE:

• Carry out SRE duties for Kafka Streaming Platform.

• Have thorough understanding on the Kafka architecture along with the concepts of Producer, Consumer, topics, partitions etc.

• Keep an eye on the platforms and adhere to runbooks/SOPs to manage platform and application problems.

• Familiarize yourself with the cluster maintenance processes and implement changes as per the documented installation and validation plans.

• Showcase robust troubleshooting and debugging skills, aiming to pinpoint and rectify the issue, while also offering advice on how to prevent such problems in the future.

• Conduct thorough root cause analysis of major production incidents, document for future reference, and put in place proactive measures to enhance system reliability.

• Automate routine tasks using scripts or automation tools to lessen manual work, decrease the chance of human errors, and boost system reliability.

• Technical Skills required:

o At least 2-3 years of experience for a junior level role and 5+ for mid-level/senior level working as a Site reliability engineer for Kafka Platform.

o Deep level Knowledge on core Kafka components like producers, consumers, topics, partitions etc.

o Troubleshooting both Kafka platform service, application problems and identifying the root cause.

o Writing Ansible playbooks and automate manual tasks using Ansible, shell scripting and python.

o Should be familiar with Unix/Linux system internals, networking, and distributed systems.

Key Skills

Sre

Education

Any Graduate

Back To Jobs

Posted On: 19-Nov-2024
Experience: 3+ years of experience
Openings: 2
Category: Hadoop/Kafka SRE
Tenure: Contract - Corp-to-Corp Position