● Managing backend data ingestion/integration pipelines development lifecycle including architecture, design, development, testing, and deployment. Explore and discover new data sources and quickly familiarize with the available APIs or other data acquisition methods like web-scraping to ingest data Build quick proof of concepts of new data sources to showcase data capabilities and help analytics team identify key metrics and dimensions Design, develop and maintain data ingestion & integration pipelines from various sources which may include contacting primary or third party-data providers to resolve questions, inconsistencies, and/or obtain missing data Design, implement and manage a near real-time ingestion & integration pipelines Analyze data to identify outliers, missing, incomplete, and/or invalid data; Ensure accuracy of all data from source to final deliverable by creating automated quality checks Evangelize an extremely high standard of code quality, system reliability, and performance● 3+ years of experience using an ETL tool like Pentaho enterprise or community edition and a total of at least a total of 5+ years of experience in ETL or web application experience.
● Minimum 10+ years of experience in building enterprise level software solutions. Minimum 4+ years of experience in architecting cloud-based software solutions. Minimum 4+ years of experience in APIs based development using Python, Java
● Experience in architecting & building the secured, reliable and high-performance data pipeline using Python, Spark on AWS cloud ● Experience in Python libraries such as Pandas and NumPy, SciPy, Flask, SQLAlchemy and/or Automation is a plus. ● Experience in architecting solutions at scale to empower the business and support a wide variety of use cases, from experimental work to mission-critical production operations. ● Experience in real-time data processing using Python, Spark and Spark-Streaming ● Experience in ingesting and processing Social Media platforms data such as Facebook, Twitter, Instagram, Snapchat, Clickstream ● Experience working with both Structured and Unstructured data including complex JSONs ● Experience in AWS Kinesis Stream Processing, EMR, Redshift, S3, Lambda ● Experience in the database systems such as AWS Redshift, BigQuery, SQL Server or Oracle ● Experience with multi-threading and asynchronous event-driven programming ● Experience with high volume, high availability distributed systems ● Experience in coming up with the viable solutions to tough engineering problems Knowledge of code versioning tools {{such as Git, Mercurial or SVN}} ● Familiarity with the Sailthru APIs is a plus ● Bachelor’s degree in Computer Science or Related Discipline
Any Graduate