The candidate will work on building, scaling, and monitoring highly complex BigData platform on Databricks, Snowflake DB and ElasticSearch cloud.
The candidate will work on building, scaling, and monitoring Data science and ML hosting platform based on Databricks.
The candidate will be responsible for defining and implementing our Data platform, Data science workbench with ML hosting technology strategies in the Cloud.
The candidate will also involve to design our architecture and define our roadmap.
The candidate will be responsible to implement monitoring, ing and observability solutions using CSP standard monitoring and analytical tools and also 3rd party tools like Zabbix and Datadog SaaS.
The candidate has to be an expert in Docker and Cloud based containerization technologies.
The candidate will also be responsible for building and improving our CI/CD pipelines and keeping themselves and their team up-to-date with the latest industry trends and technologies.
Develop automation scripts in Terraform, YAML, Helm charts, az cli, aws-cli, Powershell and other cloud APIs.
Participate in developer interaction calls and help.
Participate in Management update calls and provide precise, timely updates. Skillset:
Operations or systems administration experience, particularly on Linux.
Hands-on administration experience, from setting up the Databricks environment to successfully administering it.
Experience as the Databricks account owner, managing workspaces, Unity Catalog, audit logs, and high-level usage monitoring
Experience as Databricks workspace admin, managing workspace users and groups including single sign-on, provisioning, access control, and workspace storage.
Experience Databricks administration configuring and installing libraries
Experience managing storage accounts access across a large user base
Experience managing cluster and jobs, policies, templates, pools configuration options
Experience with Databricks security and privacy setup
Experience troubleshooting end user and platform-level issues
Experience optimizing usage for performance and cost preferred
Experience in strategizing and implementing Disaster recovery for BigData platforms based on Databricks.
Ability to multitask and reprioritize tasking on the fly according to the needs of a growing platform and its stakeholders
Strong coding skills in SQL and Python (PySpark) with experience optimizing code
Assist in N/W, Data security, IAM initiatives. Work with n/w team to create PEs, troubleshoot network issues etc.