Develop and maintain data processing applications using Hadoop and HDFS.
Design, implement, and optimize data transformations and analytics using PySpark and Python.
Collaborate with data engineers and data scientists to understand requirements and translate them into technical solutions.
Write efficient and scalable code to process and analyze large volumes of data.
Ensure the reliability, scalability, and performance of data processing pipelines.
Troubleshoot and debug issues in the Hadoop/HDFS and PySpark Python environment.
Collaborate with cross-functional teams to integrate data processing pipelines with other systems and tools.
Stay up-to-date with emerging technologies and industry trends related to big data processing and analytics.
Qualifications:
Experience in developing applications using Hadoop and HDFS.
Proficiency in PySpark and Python programming.
Solid understanding of distributed computing concepts and parallel processing.
Experience with data processing frameworks and tools like MapReduce, Hive, and Spark.
Knowledge of SQL and experience working with relational databases.
Familiarity with data serialization formats like Avro, Parquet, or ORC.
Strong problem-solving and debugging skills.
Excellent communication and collaboration abilities.
Ability to work effectively in a fast-paced, dynamic environment
Any Graduate