Client is searching for a Hadoop Administrator to work onsite in San Antonio, Texas.The Hadoop administrator is responsible for the care, maintenance, administration, and reliability of the Hadoop ecosystem. The role includes ensuring system security, stability, reliability, capacity planning, recoverability (protecting business data) and performance. In addition to providing new system and data management solution delivery to meet the growing and evolving data demands of the enterprise. Hadoop administrator using Cloudera, administers Cloudera technology and systems responsible for backup, recovery, architecture, performance tuning, security, auditing, metadata management, optimization, statistics, capacity planning, connectivity, and other data solutions of Hadoop systems.
Responsibilities
- Hadoop administrator provides support and maintenance and its eco-systems including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench, etc.
- Accountable for storage, performance tuning and volume management of Hadoop clusters and MapReduce routines
- Deploys Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name-node high availability, schedule and configure it and take backups.
- Installs and configures software, installs patches, and upgrades software as needed.
- Capacity planning and implementation of new/upgraded hardware and software releases for storage infrastructure.
- Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
- Communicates with other development, administrating and business teams. They include infrastructure, application, network, database, and business intelligence teams.
- Responsible for Data Lake and Data Warehousing design and development.
- Collaboration with various technical/non-technical resources such as infrastructure and application teams regarding project work, POCs (Proofs of Concept) and/or troubleshooting exercises.
- Configuring Hadoop security, specifically Kerberos integration with ability to implement.
- Creation and maintenance of job and task scheduling and administration of jobs.
- Responsible for data movement in and out of Hadoop clusters and data ingestion using Sqoop and/or Flume.
- Review Hadoop environments and determine compliance with industry best practices and regulatory requirements.
- Data modeling, designing and implementation of data based on recognized standards.
- Working as a key person for Vendor escalation
- On-call rotation is required to support 24/7 environment and is also expected to be able to work outside business hours to support corporate needs.
Required Skills
- Bachelor's degree in information systems, Engineering, Computer Science, or related field from an accredited university.
- Intermediate experience in a Hadoop production environment.
- Must have intermediate experience and expert knowledge with at least 4 of the following:
- Hands on experience with Hadoop administration in Linux and virtual environments.
- Well versed in installing & managing distributions of Hadoop (Cloudera).
- Expert knowledge and hands-on experience in Hadoop ecosystem components; including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench, etc.
- Thorough knowledge of Hadoop overall architecture.
- Experience using and troubleshooting Open-Source technologies including configuration management and deployment.
- Data Lake and Data Warehousing design and development.
- Experience reviewing existing DB and Hadoop infrastructure and determine areas of improvement.
- Implementing software lifecycle methodology to ensure supported release and roadmap adherence.
- Configuring high availability of name-nodes.
- Scheduling and taking backups for Hadoop ecosystem.
- Data movement in and out of Hadoop clusters