Skills:
Big Data, SQL, python, pyspark, etl, microsoft azure, gcp, aws, Job Description Print Preview 09/07/24, 3:07PM Job Title: Engineer - Data Engineering Key Responsibilities: Data Pipeline Development: Design, develop, and maintain scalable data pipelines using PySpark to process large volumes of data from various sources. Data Integration: Integrate data from multiple data sources and formats, ensuring high data quality and reliability. Optimization: Optimize and tune data processing jobs for performance and cost-efficiency. Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver high-quality data solutions. ETL Processes: Develop and maintain ETL processes to extract, transform, and load data into data warehouses and data lakes. Data Quality: Implement data validation and monitoring processes to ensure data accuracy and consistency. Documentation: Document data engineering processes, workflows, and best practices. Troubleshooting: Identify, troubleshoot, and resolve data-related issues promptly. Required Qualifications: Experience: 3+ years of experience in data engineering or a related field. Education: Bachelors degree in Computer Science, Information Technology, Engineering, or a related field. Technical Skills: Proficiency in PySpark and Python. Strong knowledge of big data technologies such as Hadoop, Hive, and Spark. Experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services. Familiarity with data warehousing solutions (e.g., Amazon Redshift, Google BigQuery, Snowflake). Knowledge of relational and NoSQL databases (e.g., MySQL, MongoDB, Cassandra). Data Processing: Experience with ETL/ELT processes and data pipeline orchestration tools (e.g., Apache Airflow, Apache NiFi). Problem-Solving: Strong analytical and problem-solving skills. Communication: Excellent verbal and written communication skills, with the ability to explain complex technical concepts to non-technical stakeholders Long Description Key Responsibilities: Data Pipeline Development: Design, develop, and maintain scalable data pipelines using PySpark to process large volumes of data from various sources. Data Integration: Integrate data from multiple data sources and formats, ensuring high data quality and reliability. Optimization: Optimize and tune data processing jobs for performance and cost-efficiency. Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver high-quality data solutions. ETL Processes: Develop and maintain ETL processes to extract, transform, and load data into data warehouses and data lakes. Data Quality: Implement data validation and monitoring processes to ensure data accuracy and consistency. Documentation: Document data engineering processes, workflows, and best practices. Troubleshooting: Identify, troubleshoot, and resolve data-related issues promptly.
Bachelor's degree in Computer Science