Roles and Responsibilities :
Develop, maintain, and improve ETL pipelines for product and process measurements, WIP updates, production flows and dispositions, and process configuration deliverables.
Working with business stakeholders and IT to translate business logic into scalable data and analytic solutions
Leverage innovative new technologies and approaches to renovate, extend, and transform the existing core data assets, including SQL-based, NoSQL-based, and Cloud-based data platforms
Ingest, Extract, Move, Transform, Cleanse, and Load massive structured and unstructured data in Hadoop environment both in batch and real-time.
Analyse technology environments to detect critical deficiencies and recommend solutions for improvement
Draft architectural diagrams, interface specifications and other design documents
Guides development teams in the design and build of complex Data or Platform solutions and ensures that teams are in alignment with the architecture blueprint, standards, target state architecture, and strategies
Coordinates, executes, and participates in component integration (CIT) scenarios, systems integration testing (SIT), and user acceptance testing (UAT) to identify application errors and to ensure quality software deployment.
Test data infrastructure to ensure reliability and validate analytics solutions to ensure accuracy.
Builds, tests and enhances data curation pipelines integration data from wide variety of sources like DBMS, File systems, APIs and streaming systems for various OKRs and metrics development with high data quality and integrity
Builds, tests and enhances BI solutions from a wide variety of sources like Teradata, Hive, H base, Google Big Query and File systems; develops solutions with optimized data performance and data security
Demonstrates database skill (Teradata/Oracle/Db2/Hadoop) by writing views for business requirements; uses freeform SQLs and pass-through functions; analyses and finds errors from SQL generation; creates RSD and dashboard
Architect, implement and maintaining multi-layered SQL and Python processes
Creating, scheduling, maintaining, and debugging ETL and ELT processes from systems including Siebel, PeopleSoft, NetSuite, HelpScout, and custom in-house products
Build, manage, and maintain data transformation processes in Snowflake and Big Query
Perform code review , Enhance and maintain STAR schema analytics data warehouse
Build, manage, and maintain data transformation processes in Snowflake and Big Query
Administers crucial and complex Bigdata/Hadoop infrastructure to enable next generation analytics and data science capabilities
Experience with structured and unstructured data and using ML to solve problems with unstructured data sets and relational database
You have a proven ability with statistical methods and advanced modelling techniques (e.g., SVM, Random Forest, Bayesian inference, graph models, NLP, Computer Vision, neural networks, etc.)
Required Skills:
Programming Languages: Python, Java, NodeJS, C#, C++, NoSQL, HTML, CSS, SQL, Scala
Database: SQL server, Postgre SQL, pls/ql, Teradata, MongoDB, NoSQL
AWS Cloud: Sage Maker, Cloud architect, Cloud formation, Amazon web services (EC2, IAM, ECS, s3, RDS, DynamoDB, Cloud Watch, ELB, VPC, Route53), Lambda, Elastic Beanstalk
Experience working with AWS RDS, Aurora, Lambda, S3, and Apache Kafka is desirable.
Machine Learning and Data Science: Supervised and Unsupervised Learning algorithms, Reinforcement Learning, Neural Network, CNN, Deep learning, RNN, Map reduce, Parallel processing, Date Pre-processing
Frameworks and Tools: AWS Services, Microsoft Azure, SQL Server, SK-learn, Keras, Spark, TensorFlow, Tableau, Informatica, SSIS, Automation, Bash, PowerShell, Linux scripting, Git
Microsoft Azure Cloud Other Azure resources include Active Directory and Managed Identities, Key Vault, Virtual Machines, Cosmos DB, Logic Apps, Container Registries, Log Analytics
ANY GRADUATE