Description

Job Description:

We are seeking a skilled Python PySpark Developer to join our data engineering team. The ideal candidate will have experience in building scalable data pipelines and working with large datasets in a distributed environment.

Key Responsibilities:

  • Design, develop, and maintain robust data pipelines using PySpark.
  • Collaborate with data scientists and analysts to understand data requirements and deliver solutions.
  • Optimize and tune Spark jobs for performance and scalability.
  • Write clean, maintainable, and efficient code.
  • Perform data validation and quality checks on datasets.
  • Troubleshoot and debug issues in data processing workflows.
  • Document technical specifications and processes.
  • Stay updated with the latest technologies and best practices in data engineering.

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • 3+ years of experience in Python programming.
  • Strong experience with Apache Spark and PySpark.
  • Familiarity with data processing frameworks and ETL tools.
  • Knowledge of SQL and experience with relational databases (e.g., PostgreSQL, MySQL).
  • Experience with cloud platforms (e.g., AWS, Azure, GCP) is a plus.
  • Understanding of data warehousing concepts and architectures.
  • Strong problem-solving skills and attention to detail.
  • Excellent communication and teamwork abilities.

Preferred Skills:

  • Experience with distributed computing concepts.
  • Familiarity with machine learning libraries (e.g., scikit-learn, TensorFlow).
  • Knowledge of data visualization tools (e.g., Tableau, Power BI).

Education

Bachelor's Degree