Job Description
We are looking for a Spark developer who knows how to fully exploit the potential of our Spark cluster.
You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our feature developers and business analysts.
This involves both ad-hoc requests as well as data pipelines that are embedded in our production environment.
Responsibilities
- Create Scala/Spark jobs for data transformation and aggregation
- Produce unit tests for Spark transformations and helper methods
- Write Scaladoc-style documentation with all code
- Design data processing pipelines
Skills
- Scala (with a focus on the functional programming paradigm)
- Scalatest, JUnit, Mockito {{ , Embedded Cassandra }}
- Apache Spark 2.x
{{ Apache Spark RDD API }}
{{ Apache Spark SQL DataFrame API }}
{{ Apache Spark MLlib API }}
{{ Apache Spark GraphX API }}
{{ Apache Spark Streaming API }} - Spark query tuning and performance optimization
- SQL database integration {{ Microsoft, Oracle, Postgres, and/or MySQL }}
- Experience working with {{ HDFS, S3, Cassandra, and/or DynamoDB }}
- Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)