"Responsibilities
• Design, develop, and maintain data processing pipelines using Apache Spark.
• Collaborate with data engineers, data scientists, and business analysts to understand data requirements and deliver solutions that meet business needs.
• Write efficient Spark code to process, transform, and analyze large datasets.
• Optimize Spark jobs for performance, scalability, and resource utilization.
• Integrate Hadoop, Hive, Spring, Hibernate, Kafka, and ETL processes into Spark applications.
• Troubleshoot and resolve issues related to data pipelines and Spark applications.
• Monitor and manage Spark clusters to ensure high availability and reliability.
• Implement data quality and validation processes to ensure accuracy and consistency of data.
• Stay up-to-date with industry trends and best practices related to Spark, big data technologies, Python, and AWS services.
• Document technical designs, processes, and procedures related to Spark development.
• Provide technical guidance and mentorship to junior developers on Spark-related projects.
Qualifications
• Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
• Proven experience (10+ years) as a Spark Developer or in a similar role working with big data technologies.
• Strong proficiency in Apache Spark, including Spark SQL, Spark Streaming, and Spark MLlib.
• Proficiency in programming languages such as Scala or Python for Spark development.
• Experience with data processing and ETL concepts, data warehousing, and data modeling.
• Solid understanding of distributed computing principles and cluster management. " " "
ANY GRADUATE