Description

We have an immediate long-term opportunity with one of our prime clients for a position of Data Engineer to work on Remote basis.

MUST HAVES (TOP 3): 

  1. Clickhouse
  2. Kubernetes - understanding infra
  3. AWS
    1. Need to know how everything connects
  4. At scale and multi-tenant is a plus
    1. Have spoken to some who did have it at scale and it was small scale and was not large enough for what they need

PROJECT DETAILS (Size, Scale, Scope):  

  • What is the scope of work that is being completed?  Building out a data lake
  • What part of the project is this resource supporting?  Building out the data lake - and creating the clickhouse database across the org to scale
  • Day to day responsibilities 
  • Are there deliverables and milestones the team is working towards?  If so, what? Working towards the go live in 12/31 - at which point starting 1/1 they will have 12 months to scale the data lake across the entire enterprise
  • What is the next phase of this project? Go live with production on 12/31

  
 Transcription Notes: 

The meeting focused on discussing the development and implementation of a monitoring data lake, with specific emphasis on the technologies and strategies involved. Here are the key points:

  • Data Pipeline and Storage: Frohman detailed the process of collecting logs from various sources, structuring them through a pipeline using vectored dev for rate limiting and security, and storing them in ClickHouse as the database. 1
  • Querying and Visualization: They plan to offer Grafana for standard querying but will support other tools like Databricks, Power BI, and Excel. An open-source tool called Keep will serve as the middle layer for event management and rules engine. 2
  • Event Management: Keep will query the lake for triggering events in ServiceNow, acting as an intermediary due to ServiceNow's inability to handle millions of rules directly. 3
  • Scale and Talent Needs: The platform aims to handle 100 petabytes a month initially, requiring sharp talent, especially in ClickHouse and vectored dev. They are seeking vendor support for architecture and operational expertise. 4
  • Challenges and Solutions: The discussion also covered the challenges of scaling, the need for a multi-tenant solution, and the importance of optimizing SQL queries for efficiency. The goal is to consolidate various data sources into a single, queryable database to facilitate correlation and analysis. 5
  • Vendor and Staffing Strategy: Frohman mentioned the plan to host the platform themselves due to governance and legal timeframes, with a preference for platform-as-a-service. They are exploring offshore and onshore staffing options, with specific bill rates and skill sets in mind

Education

Any Graduate