Description

Role: Hadoop Platform Engineer

Locations: Irving, TX / Jersey City, NJ / Charlotte, NC / Newark, DE

Duration: Fulltime

 

As a Hadoop Platform Engineer, you will be responsible for designing, implementing, and managing our company's Hadoop infrastructure and data ecosystem. You will collaborate with cross-functional teams to understand data requirements, optimize data pipelines, and ensure the reliability and performance of our Hadoop clusters. You will also be responsible for administering and monitoring the Hadoop environment, troubleshooting issues, and implementing security measures. 

 

Required Skills: 

 

Platform Engineering: 

  • Cluster Management: 
  • Expertise in design, implement, and maintain Hadoop clusters in large volume, including components such as HDFS, YARN, and MapReduce. 
  • Collaborate with data engineers and data scientists to understand data requirements and optimize data pipelines. 
  • Administration and Monitoring: 
    • Experience in administering and monitoring Hadoop clusters to ensure high availability, reliability, and performance. 
    • Experience in troubleshooting and resolving issues related to Hadoop infrastructure, data ingestion, data processing, and data storage. 
  • Security Implementation: 
    • Experience in Implementing and managing security measures within Hadoop clusters, including authentication, authorization, and encryption. 
  • Backup and Disaster Recovery: 
  • Collaborate with cross-functional teams to define and implement backup and disaster recovery strategies for Hadoop clusters. 
  • Performance Optimization: 
    • Experience in optimizing Hadoop performance through fine-tuning configurations, capacity planning, and implementing performance monitoring and tuning techniques. 
  • Automation and DevOps Collaboration: 
    • Work with DevOps teams to automate Hadoop infrastructure provisioning, deployment, and management processes. 
  • Technology Adoption and Recommendations: 
    • Stay up to date with the latest developments in the Hadoop ecosystem. 
    • Recommend and implement new technologies and tools that enhance the platform. 
  • Documentation: 
    • Experience in documenting Hadoop infrastructure configurations, processes, and best practices. 
  • Technical Support and Guidance: 
    • Provide technical guidance and support to other team members and stakeholders. 

  

Admin: 

  • User Interface Design: 
  • Relevant for designing interfaces for tools within the Hadoop ecosystem that provide self-service capabilities, such as Hadoop cluster management interfaces or job scheduling dashboards. 
  • Role-Based Access Control (RBAC): 
    • Important for controlling access to Hadoop clusters, ensuring that users have appropriate permissions to perform self-service tasks. 
  • Cluster Configuration Templates: 
    • Useful for maintaining consistent configurations across Hadoop clusters, ensuring that users follow best practices and guidelines. 
  • Resource Management: 
    • Important for optimizing resource utilization within Hadoop clusters, allowing users to manage resources dynamically based on their needs. 
  • Self-Service Provisioning: 
    • Pertinent for features that enable users to provision and manage nodes within Hadoop clusters independently. 
  • Monitoring and Alerts: 
    • Essential for monitoring the health and performance of Hadoop clusters, providing users with insights into their cluster's status. 
  • Automated Scaling: 
    • Relevant for automatically adjusting the size of Hadoop clusters based on workload demands. 
  • Job Scheduling and Prioritization: 
    • Important for managing data processing jobs within Hadoop clusters efficiently. 
  • Self-Service Data Ingestion: 
    • Applicable to features that facilitate users in ingesting data into Hadoop clusters independently. 
  • Query Optimization and Tuning Assistance: 
    • Relevant for providing users with tools or guidance to optimize and tune their queries when interacting with Hadoop-based data. 
  • Documentation and Training: 
    • Important for creating resources that help users understand how to use self-service features within the Hadoop ecosystem effectively. 
  • Data Access Control: 
    • Pertinent for controlling access to data stored within Hadoop clusters, ensuring proper data governance. 
  • Backup and Restore Functionality: 
    • Applicable to features that allow users to perform backup and restore operations for data stored within Hadoop clusters. 
  • Containerization and Orchestration: 
    • Relevant for deploying and managing applications within Hadoop clusters using containerization and orchestration tools. 
  • User Feedback Mechanism: 
    • Important for continuously improving self-service features based on user input and experience within the Hadoop ecosystem. 
  • Cost Monitoring and Optimization: 
    • Applicable to tools or features that help users monitor and optimize costs associated with their usage of Hadoop clusters. 
  • Compliance and Auditing: 
    • Relevant for ensuring compliance with organizational policies and auditing user activities within the Hadoop ecosystem.