Description

Role & Responsibilities:

 

Work with cloud engineers and customers to solve big data problems by developing utilities for migration, storage and processing on Azure Cloud.

Design and build a cloud migration strategy for cloud and on-premise applications.

Diagnose and troubleshoot complex distributed systems problems and develop solutions with a significant impact at massive scale.

Build tools to ingest and jobs to process several terabytes or petabytes per day.

Design and develop next-gen storage and compute solutions for several large customers.

Define Data Architecture for the Data Science teams and participate in review and walk-through sessions for model fit and model productionization

Provide thought leadership on data integrity & quality for data science workloads
Be involved in proposals, RFPs and provide effort estimates, solution design etc.

Communicate with a wide set of teams, including Infrastructure, Network, Engineering, DevOps, SiteOps teams, and cloud customers.

Build advanced tooling for automation, testing, monitoring, administration, and data operations across multiple cloud clusters.

Better understanding of Data modeling and governance

 Must have:

 

  • 8+ years experience of Hands-on in data structures, distributed systems, Hadoop and spark, SQL and NoSQL Databases
  • Strong software development skills in at least one of: Python, Java or Scala. SQL commands
  • Experience building and deploying cloud-based solutions at scale.
  • Experience in developing Big Data solutions (migration, storage, processing)
  • Experience building and supporting large-scale systems in a production environment.
  • Designing and development of ETL pipeline
  • Modern Azure data warehouse design skills
  • Requirement gathering and understanding of the problem statement.
  • End-to-end ownership of the entire delivery of the project
  • Designing and documentation of the solution
  • Knowledge of RDBMS & NoSQL databases
  • Any of Kafka, Kinesis, Cloud pub-sub
  • Cloud Platforms Azure (GCP & AWS Good to have)
  • Any of Apache Hadoop/CDH/HDP/EMR/Google DataProc/HD-Insights Distributed processing Frameworks.
  • One or more of MapReduce, Apache Spark, Apache Storm, Apache Flink. Database/warehouse
  • Hive, HBase, and at least one cloud native services Orchestration Frameworks
  • Any of Airflow, Oozie, Apache NiFi, Google Data Flow Message/Event Solutions
  • Reporting tool exposure (at least one of Power BI, Tableau, Looker)
  • Enable best practices w.r.t data handling.

Education

Any graduate