Description

Responsibilities:

Client is seeking a Software Engineer to build and optimize our data platforms leveraging AWS and key technologies like CDAP, EMR, Java Spark, Snowflake, and Databricks. You will design and implement robust and scalable data pipelines, ETL, and analytics systems in the cloud.

 

Duties and Responsibilities:

  • Develop and enhance data pipelines, ETL processes using CDAP on AWS infrastructure.
  • Creating new plugins for CDAP (Java + spark)
  • Enhance the open source CDAP source code for vulnerabilities and new features
  • Build data integration flows to migrate large datasets into Snowflake data warehouse.
  • Implement AWS infrastructure-as-code solutions for deployment automation.
  • Instrument data pipelines and leverage monitoring for performance tuning and reliability.
  • Work with data scientists to optimize data workflows and models on Databricks.
  • Follow security best practices for access control, encryption, auditing across data platforms.
  • Participate in architecture reviews and technology selections.
  • Continuously monitor and improve data platforms for scalability and costs.

 

Qualifications:

Required and Desired Skills/Certifications:

  • 5+ year’s experience in backend development or data engineering.
  • Hands-on experience with AWS services like S3, EC2, EKS, EMR.
  • Proficiency & experience with Java, Spark, Kafka, SQL, CDAP.
  • Experience building scalable ETL processes and workflows.
  • Strong programming ability with Python, Java and unit testing.
  • Infrastructure-as-code expertise with CI/CD pipelines.
  • Ability to communicate complex topics clearly.  

Additional Requirements/ Nice to have:

  • Experience with Snowflake, Databricks, and GCP or Azure.
  • Knowledge of streaming data architectures
  • Data security and compliance implementation.
  • Machine Learning Operations (MLOps) experience.

Education

Bachelor's degree