Required:
10+ years of overall IT experience.
- 3+ years of experience with high-velocity, high-volume stream processing: Apache Kafka and Spark Streaming.
- Experience with real-time data processing and streaming techniques using Spark Structured Streaming and Kafka.
- Deep knowledge of troubleshooting and tuning Spark applications.
- 3+ years of experience with data ingestion from Message Queues (TIBCO, IBM, etc.) and different file formats across different platforms like JSON, XML, CSV.
- 3+ years of experience with Big Data tools/technologies like Hadoop, Spark, Spark SQL, Kafka, Sqoop, Hive, S3, HDFS, etc.
- 3+ years of experience building, testing, and optimizing ‘Big Data’ data ingestion pipelines, architectures, and data sets.
- 2+ years of experience with Python and/or Scala, i.e., PySpark/Scala-Spark.
- 3+ years of experience with Cloud platforms (e.g., AWS, GCP).
- 3+ years of experience with database solutions like Kudu/Impala, Delta Lake, Snowflake, or BigQuery.
- 2+ years of experience with NoSQL databases including HBASE and/or Cassandra.
- Experience in successfully building and deploying a new data platform on Azure/AWS.
- Experience in Azure/AWS Serverless technologies like S3, Kinesis/MSK, Lambda, and Glue.
- Strong knowledge of Messaging Platforms like Kafka, Amazon MSK, TIBCO EMS, or IBM MQ Series.
- Experience with Databricks UI, managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL, Delta Live Tables, Unity Catalog.
- Knowledge of Unix/Linux platform and shell scripting.
- Strong analytical and problem-solving skills.
Preferred (Not Required):
- Strong SQL skills with the ability to write intermediate complexity queries.
- Strong understanding of Relational & Dimensional modeling.
- Experience with GIT code versioning software.
- Experience with REST API and Web Services.
- Good business analyst and requirements gathering/writing skills.
Education
Bachelor’s Degree required. Preferably in Information Systems, Computer Science, Computer Information Systems, or a related field.