Description

Essential Functions
Extensive experience in cloud technologies for streaming platforms, with a focus on AWS services for data lake creation, orchestration, and analytics.
Innovative problem solver with a demonstrated ability to develop intricate algorithms based on deep-dive statistical analysis, enhancing customer relationships, and personalizing interactions.
Hands-on experience with cloud-based tools such as AWS EMR, EC2, Data Factory, Data Bricks, Data Governance, Data Lake, and Delta Lake.
Proven ability to handle large-scale data processing using Big Data technologies, such as Hadoop, Hive, and Spark
Collaborate and lead development in a Scrum environment, emphasizing teamwork and agile methodologies.
Deliverables
Will work with business partners across Marketing functions to understand business needs, translate them to technical requirements, wrangle data from various systems, and design automated data pipelines to drive insights.
Will work with internal technology groups to automate the data collection and definition process.
Requirements
At least twelve years? experience and significant experience in technology management, analysis, and hands-on development
Results-oriented IT professional with 9 years of experience as a Data Engineer specializing in designing data-intensive applications
Knowledge in Hadoop Ecosystem, Big Data Analytics, Talend, Spark, PySpark, DataStage, and Cloud Data Engineering
Proven track record in implementing data pipelines, storage solutions, and warehousing systems with expertise in AWS tools such as S3, RDS, Dynamo DB, Redshift, and Athena
In-depth knowledge of Hadoop architecture and its components, along with extensive experience in enterprise-level solutions using Apache Spark, MapReduce, Kubernetes, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, and YARN.
Skilled in data cleansing and pre-processing using Python, Alteryx, and Tableau, with proficiency in data ingestion, processing, and quality assurance.
Well-versed in designing logical and physical data models, implementing data workflows, and deploying Splunk clusters
Adept at improving algorithm performance and optimization using Apache Spark, PySpark, and Scala
Skilled in creating, debugging, scheduling, and monitoring jobs using Airflow and Oozie.
Proficient in handling database issues and connections with SQL and NoSQL databases, including MongoDB, HBase, Cassandra, SQL Server, and PostgreSQL.
Strong background in scripting with UNIX Shell, Perl, and Java, with experience in creating RDBMS tables, stored procedures, and ETL data flows. Expert in designing parallel jobs and fact-dimensional modelling.
Proven experience in implementing CI/CD pipelines using GIT, Terraform, Ansible, and fact dimensional modeling (Star schema, Snowflake schema).
Skilled in building and productionizing predictive models on large datasets using advanced statistical modeling, machine learning, and data mining techniques
Expertise in utilizing Oozie and workflow schedulers, implementing security requirements for Hadoop, and optimizing Hive tables with partitioning and bucketing.
Experienced in developing web-based applications using Python, DJANGO, QT, C , XML, CSS3, HTML5, DHTML, JavaScript, and jQuery.
Proficient in using various Python packages for data analysis and visualization, including ggplot2, caret, dplyr, pandas, NumPy, seaborn, SciPy, matplotlib, scikitlearn, and Beautiful Soup.
Active involvement in software development practices, including the creation of reusable frameworks for ETL processes.

Education

Bachelor's degree