JOB DESCRIPTION
As a Hadoop Platform Engineer, you will be responsible for designing, implementing, and managing our company's Hadoop infrastructure and data ecosystem. You will collaborate with cross-functional teams to understand data requirements, optimize data pipelines, and ensure the reliability and performance of our Hadoop clusters. You will also be responsible for administering and monitoring the Hadoop environment, troubleshooting issues, and implementing security measures.
Responsibilities:
- Design, implement, and maintain Hadoop clusters, including HDFS, YARN, and MapReduce components.
- Collaborate with data engineers and data scientists to understand data requirements and optimize data pipelines for efficient processing and analysis.
- Administer and monitor Hadoop clusters to ensure high availability, reliability, and performance.
- Troubleshoot and resolve issues related to Hadoop infrastructure, data ingestion, data processing, and data storage.
- Implement and manage security measures to protect data within the Hadoop ecosystem, including authentication, authorization, and encryption.
- Collaborate with cross-functional teams to define and implement backup and disaster recovery strategies for Hadoop clusters.
- Optimize Hadoop performance by fine-tuning configurations, capacity planning, and implementing performance monitoring and tuning techniques.
- Work with DevOps teams to automate Hadoop infrastructure provisioning, deployment, and management processes.
- Stay up to date with the latest developments in the Hadoop ecosystem and recommend and implement new technologies and tools that enhance the platform.
- Document Hadoop infrastructure configurations, processes, and best practices.
- Provide technical guidance and support to other team members and stakeholders.
Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
- Strong experience in designing, implementing, and administering Hadoop clusters in a production environment.
- Proficiency in Hadoop ecosystem components such as HDFS, YARN, MapReduce, Hive, Spark, and HBase.
- Experience with cluster management tools like Apache Ambari or Cloudera Manager.
- Solid understanding of Linux/Unix systems and networking concepts.
- Strong scripting skills (e.g., Bash, Python) for automation and troubleshooting.
- Knowledge of database concepts and SQL.
- Experience with data ingestion tools like Apache Kafka or Apache NiFi.
- Familiarity with data warehouse concepts and technologies.
- Understanding of security principles and experience implementing security measures in Hadoop clusters.
- Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex issues.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- Relevant certifications such as Cloudera Certified Administrator for Apache Hadoop (CCAH) or Hortonworks Certified Administrator (HCA) are a plus.