Description


Must-Have Skills

Hadoop: Extensive experience with Hadoop ecosystem and related technologies
Programming: Proficiency in Python (or Scala)
SparkSQL: Strong skills in Spark SQL for data processing and analysis
SQL: Proficiency in at least one of the following: MySQL, Hive, Impala
Data Ingestion: Experience with ingesting data from various sources (message queues, file shares, REST APIs, relational databases) and handling different data formats (JSON, CSV, XML).
Spark Structured Streaming: Experience working with Spark Structured Streaming for real-time data processing.

Responsibilities

Design, develop, and maintain data pipelines using Hadoop, Spark, and Python/Scala
Ingest data from various sources and transform it into usable formats
Optimize Spark/MapReduce and SQL jobs for performance
Work in an Agile development environment, collaborating with cross-functional teams
Utilize version control systems like SVN, Git, or Bitbucket
Deploy and manage applications using Jenkins and handle Jar management

Qualifications

Strong understanding of Big Data concepts and technologies
Hands-on experience with Hadoop, Spark, Sqoop, Kafka, MapReduce, and NoSQL databases (HBase, Solr, etc.)
Proficiency in Python or Scala for data processing and scripting
Experience with Linux operating system
Familiarity with data visualization tools like Elastic Search and Kibana
Ability to work independently and as part of a team
Excellent communication and problem-solving skills
Adaptable and eager to learn new technologies

Education

Any Graduate