Position Overview
As a Data Engineer, you will play a crucial role in designing, building, and maintaining scalable data pipelines and analytical solutions. You will work closely with cross-functional teams to understand data requirements, develop efficient data processing workflows, and deliver actionable insights to support business decision-making. The ideal candidate will have a strong background in data engineering and analytics, with expertise in Spark, PySpark, Tableau, and SQL Query.
Responsibilities
Data Pipeline Development: Design, develop, and deploy robust data pipelines to extract, transform, and load (ETL) large volumes of structured and unstructured data from diverse sources. Implement data processing workflows using Spark and PySpark to ensure scalability, reliability, and efficiency
Data Modeling and Optimization: Design and implement data models and schemas to support analytical requirements and facilitate data integration. Optimize data storage and retrieval processes to improve performance and reduce latency
Data Quality Assurance: Implement data quality checks and validation procedures to ensure the accuracy, completeness, and integrity of the data. Identify and address data quality issues through data cleansing, transformation, and validation techniques
Data Visualization and Reporting: Develop interactive dashboards and reports using Tableau to visualize key performance indicators (KPIs), trends, and insights derived from the data. Collaborate with business stakeholders to define reporting requirements and deliver actionable insights
Performance Tuning and Optimization: Monitor and optimize the performance of data processing workflows and analytical queries. Identify opportunities for performance improvement and implement tuning strategies to enhance efficiency and scalability
Documentation and Documentation: Document data engineering processes, workflows, and best practices. Maintain comprehensive documentation of data pipelines, data models, and data lineage to support knowledge sharing and collaboration
Collaboration and Stakeholder Engagement: Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver solutions that meet business objectives. Communicate effectively with stakeholders to gather requirements, provide updates, and address feedback
Continuous Learning and Professional Development: Stay abreast of emerging trends and technologies in data engineering, analytics, and visualization. Continuously enhance technical skills and expertise through self-directed learning, training, and professional development opportunities
Qualifications
Bachelor's degree in computer science, engineering, mathematics, or related field (Master's degree preferred)
Minimum of 3-5 years of experience in data engineering, analytics, or related roles
Strong proficiency in Spark and PySpark for distributed data processing and analytics
Solid understanding of relational databases, SQL query optimization, and database management systems (DBMS)
Experience with data visualization tools such as Tableau for creating interactive dashboards and reports
Proficiency in programming languages such as Python, Scala, or Java for data manipulation and analysis
Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP) for data storage, processing, and analytics
Strong analytical and problem-solving skills, with the ability to translate complex data requirements into scalable solutions
Excellent communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams and stakeholders
Proven ability to work independently and manage multiple tasks and priorities in a dynamic environment
Bachelor's degree