Description

What You'll Do

Architect and develop large-scale, distributed data processing pipelines using technologies like Apache Spark, Apache Beam, and Apache Airflow for orchestration

Design and implement efficient data ingestion, transformation, and storage solutions for structured and unstructured data

Partner closely with Engineering Leaders, Architects, and Product Managers to understand business requirements and provide technical solutions within a larger roadmap

Build and optimize real-time and batch data processing systems, ensuring high availability, fault tolerance, and scalability

Collaborate with data engineers, analysts, and scientists to understand business requirements and translate them into technical solutions

Implement best practices for data governance, data quality, and data security across the entire data lifecycle

Mentor and guide junior engineers, fostering a culture of continuous learning and knowledge sharing

Stay up-to-date with the latest trends, technologies, and industry best practices in the big data and data engineering domains

Participate in code reviews, design discussions, and technical decision-making processes

Contribute to the development and maintenance of CI/CD pipelines, ensuring efficient and reliable deployments

Collaborate with cross-functional teams to ensure the successful delivery of projects and initiatives

What You'll Bring

Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field

Minimum of 10 years of experience in backend software development, with a strong focus on data engineering and big data technologies

Proven expertise in Apache Spark, Apache Beam, and Airflow, with a deep understanding of distributed computing and data processing frameworks

Proficient in Java, Scala and SQL, with the ability to write clean, maintainable, and efficient code

Proven experience building enterprise-grade software in a cloud-native environment (GCP or AWS) using cloud services such as GCS/S3, Dataflow/Glue, Dataproc/EMR, Cloud Function/Lambda, BigQuery/Athena, BigTable/Dynamo

Experience with cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes)

Experience in stream / data processing technologies like Kafka, Spark, Google BigQuery, Google Dataflow, HBase

Familiarity designing CI/CD pipelines with Jenkins, Github Actions, or similar tools

Experience with SQL, particularly performance optimization

Experience with Graph and Vector database or processing frameworks

Strong knowledge of data modeling, data warehousing, and data integration best practices

Familiarity with streaming data processing, real-time analytics, and machine learning pipelines

Excellent problem-solving, analytical, and critical thinking skills

Strong communication and collaboration skills, with the ability to work effectively in a team environment

Experience in mentoring and leading technical teams

Education

ANY GRADUATE