Job Description
Job Overview:
We are seeking a highly skilled Senior Python & PySpark Developer to join our team. The ideal candidate will have extensive experience with Python development, PySpark, and working with large datasets in distributed computing environments. You will be responsible for designing, implementing, and optimizing data pipelines, ensuring seamless data processing, and contributing to our overall data architecture.
Key Responsibilities:
- Develop, maintain, and optimize scalable data processing pipelines using Python and PySpark.
- Collaborate with cross-functional teams to understand business requirements and translate them into technical specifications.
- Work with large datasets to perform data wrangling, cleansing, and analysis.
- Implement best practices for efficient distributed computing and data processing.
- Optimize existing data pipelines for performance and scalability.
- Conduct code reviews, mentor junior developers, and contribute to team knowledge sharing.
- Develop and maintain technical documentation.
- Troubleshoot, debug, and resolve issues related to data processing.
- Collaborate with data engineers, data scientists, and analysts to deliver high-quality solutions.
Requirements
- Bachelor's or Master’s degree in Computer Science, Data Engineering, or related field.
- 5+ years of experience in Python programming.
- 3+ years of hands-on experience with PySpark and distributed data processing frameworks.
- Strong understanding of big data ecosystems (Hadoop, Spark, Hive).
- Experience working with cloud platforms like AWS, GCP, or Azure.
- Proficient with SQL and relational databases.
- Familiarity with ETL processes and data pipelines.
- Strong problem-solving skills with the ability to troubleshoot and optimize code.
- Excellent communication skills and the ability to work in a team-oriented environment.
Preferred Qualifications:
- Experience with Apache Kafka or other real-time data streaming technologies.
- Familiarity with machine learning frameworks (TensorFlow, Scikit-learn).
- Experience with Docker, Kubernetes, or other containerization technologies.
- Knowledge of DevOps tools (CI/CD pipelines, Jenkins, Git, etc.).
- Familiarity with data warehousing solutions such as Redshift or Snowflake