Description

Position Overview

As a Data Engineer, you will play a crucial role in designing, building, and maintaining scalable data pipelines and analytical solutions. You will work closely with cross-functional teams to understand data requirements, develop efficient data processing workflows, and deliver actionable insights to support business decision-making. The ideal candidate will have a strong background in data engineering and analytics, with expertise in Spark, PySpark, Tableau, and SQL Query.

Responsibilities
Data Pipeline Development: Design, develop, and deploy robust data pipelines to extract, transform, and load (ETL) large volumes of structured and unstructured data from diverse sources. Implement data processing workflows using Spark and PySpark to ensure scalability, reliability, and efficiency

Data Modeling and Optimization: Design and implement data models and schemas to support analytical requirements and facilitate data integration. Optimize data storage and retrieval processes to improve performance and reduce latency

Data Quality Assurance: Implement data quality checks and validation procedures to ensure the accuracy, completeness, and integrity of the data. Identify and address data quality issues through data cleansing, transformation, and validation techniques

Data Visualization and Reporting: Develop interactive dashboards and reports using Tableau to visualize key performance indicators (KPIs), trends, and insights derived from the data. Collaborate with business stakeholders to define reporting requirements and deliver actionable insights

Performance Tuning and Optimization: Monitor and optimize the performance of data processing workflows and analytical queries. Identify opportunities for performance improvement and implement tuning strategies to enhance efficiency and scalability

Documentation and Documentation: Document data engineering processes, workflows, and best practices. Maintain comprehensive documentation of data pipelines, data models, and data lineage to support knowledge sharing and collaboration


Collaboration and Stakeholder Engagement: Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver solutions that meet business objectives. Communicate effectively with stakeholders to gather requirements, provide updates, and address feedback

Continuous Learning and Professional Development: Stay abreast of emerging trends and technologies in data engineering, analytics, and visualization. Continuously enhance technical skills and expertise through self-directed learning, training, and professional development opportunities

Qualifications

Bachelor's degree in computer science, engineering, mathematics, or related field (Master's degree preferred)

Minimum of 3-5 years of experience in data engineering, analytics, or related roles

Strong proficiency in Spark and PySpark for distributed data processing and analytics

Solid understanding of relational databases, SQL query optimization, and database management systems (DBMS)

Experience with data visualization tools such as Tableau for creating interactive dashboards and reports

Proficiency in programming languages such as Python, Scala, or Java for data manipulation and analysis

Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP) for data storage, processing, and analytics

Strong analytical and problem-solving skills, with the ability to translate complex data requirements into scalable solutions

Excellent communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams and stakeholders

Proven ability to work independently and manage multiple tasks and priorities in a dynamic environment
 

Education

Bachelor's degree