We are looking for a skilled Python Developer with expertise in Hadoop to join our dynamic and innovative team. The ideal candidate will play a crucial role in designing, developing, and maintaining data processing applications and systems. The successful candidate should have a solid background in Python development and hands-on experience with Hadoop technologies.
Responsibilities:
- Design, develop, and maintain Python-based applications for large-scale data processing using Hadoop technologies.
- Collaborate with data engineers and data scientists to implement data solutions that meet business requirements.
- Develop and optimize data processing pipelines for efficient data extraction, transformation, and loading (ETL) processes.
- Work closely with cross-functional teams to understand data requirements and ensure the scalability and reliability of data applications.
- Troubleshoot, debug, and optimize Python code and Hadoop processes to ensure optimal performance.
- Stay updated on the latest developments in Hadoop and related technologies to propose and implement improvements.
- Collaborate with other developers and participate in code reviews to maintain code quality and consistency.
- Contribute to the documentation of data processing workflows, data models, and application architecture.
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Proven experience as a Python Developer with a focus on data processing applications.
- Hands-on experience with Hadoop ecosystem components, including HDFS, MapReduce, Hive, Pig, and Spark.
- Proficiency in writing complex, optimized Python code for data processing tasks.
- Strong understanding of data structures, algorithms, and software design principles.
- Experience with version control systems (e.g., Git) and collaborative development workflows.
- Knowledge of database systems and SQL.
- Excellent problem-solving and analytical skills.
- Effective communication and collaboration skills.
Preferred Qualifications:
- Experience with other big data technologies such as Kafka, HBase, or Flink.
- Familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud).
- Understanding of containerization and orchestration tools (e.g., Docker, Kubernetes).
- Knowledge of machine learning concepts and frameworks.
- Certification in Hadoop or related technologies is a plus