Design, Build and Analyze data solutions using Azure and Azure Databricks services. Provide data Visualization using Azure Databricks Dashboards and Power BI and provided data Insights for Business. Design and Build ETL pipelines using Azure Databricks and ingested data from different sources like Snowflake, Teradata, Vertica, Oracle ETC to azure data lake storage and monitor ongoing Databricks jobs.
Participate in migrating On-Perm data from Datalake to Azure Datalake storage using Azure Databricks. Develop Apache Spark ETL application using used PySpark to extract data from different source like Snowflake, Teradata, Vertica, SQL Server ETC and Write to azure data lake storage. Provide Azure Databricks Multi-Tenant environment (PROD, STG, DEV) for Data Ingestion Teams to extract data from On-Perm to azure data lake storage and assisted ingestion teams to test On-Perm code on Azure Databricks. Use Spark-SQL to clean, transform and aggregate data with proper file and compression types as per requirement before Writing data to azure data lake storage.
Develop UDF’s in Scala and PySpark and Hive to meet proper business requirements for data ingestion purposes and developed SQL scripts. Optimize cost on Azure Log Analytics, Developed and Automated PySpark application to extract Diagnostic logs on Azure storage accounts and extract Diagnostic logs from the Azure containers and used Spark-Sql to analyze logs to provide usage insights using Azure Databricks and Databricks job Scheduler.
Develop INIT shell script to install necessary libraries on Databricks Mulit-Tenant clusters to help Databricks clusters to connect to different data sources like Vertica, Snowflake, Teradata, and SQL servers and configure Databricks cluster policies according to the Connection requirement. Develop ETL Pipeline to process Blob Inventory data on Azure Storage accounts to find Azure containers size trend to find data expansion insights. Create Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse and SFTP server.
Automate jobs using different triggers (Event, Scheduled and Tumbling) in ADF (Azure Data Factory) and Create pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Use AWS, Hadoop, Hive , Datalake, shell scripting ,Apache Pig, HDFS, Apache Spark, SQL.
Must possess Master’s degree or equivalent in Computer Science or related field.
Any Graduate