Description

Job Description: Responsibilities Strategy & Planning • Design and implement long-term strategic goals and short-term tactical plans for managing and maintaining different monitoring tools. • Ensure that proposed and existing systems architectures are aligned with County goals and objectives. • Provide architectural expertise, direction, and assistance to Systems Analysts, Systems Engineers, other Systems Architects, and software development teams. • Develop, document, and communicate plans for investing in systems architecture, including analysis of cost reduction opportunities. • Conduct research on emerging monitoring technologies in support of systems development efforts and recommend technologies that will increase cost effectiveness and systems flexibility. Acquisition & Deployment • Where applicable, design, develop, and oversee implementation of end-to-end integrated systems. • Document Monitoring Engineering architecture and technology portfolio; make recommendations for improvements and/or alternatives. • Review new and existing systems design projects and procurement or outsourcing plans for compliance with standards and architectural plans. Operational Management • Confer with end-users and department heads to define monitoring requirements for complex systems and infrastructure development. • Develop monitoring processes based on findings through use case scenarios, workflow diagrams, and data models. • Develop and execute test plans to check infrastructure and systems technical performance. Report on findings and make recommendations for improvement. • Develop, document, communicate, and enforce a policy for standardizing systems and monitoring software as necessary. • Defining system component, ensuring scalability and guiding the overall technical direction of project • Strong interpersonal and consultative skills. • Ability to conduct research into emerging technologies and trends, standards, and products as required. • Ability to present ideas in user-friendly language. • Able to prioritize and execute tasks in a high-pressure environment. • Strong understanding of information processing principles and practices. • Strong knowledge of software evaluation principles and practices. • Proven project planning and management experience. • Good knowledge of applicable data privacy practices and laws. • Exceptional analytical, conceptual, and problem-solving abilities. • Exceptional understanding of the organization’s goals and objectives. • Superior written and oral communication skills. Knowledge & Experience • Overall experience of 12-14yrs in developing/implementing strategic systems architecture plans. • Extensive experience & deep understanding of the following Monitoring software platforms: - IBM Watson AIOPS Manager - Splunk Log Management - Dynatrace APM/ Observability - NetScout - DXNETOPS (Spectrum, Performance Center, VNA, NetFlow Analysis - Synthetic Transaction Monitoring - Infrastructure Monitoring (SiteScope). - Run Book Automation • Deeper knowledge of AIOps and result driven system that collects metrics/events from various sources to identify root causes. • Practical knowledge of cloud-based infrastructure and understanding on scalability • Spun up IBM Watson AIOPS components as docker containers inside OCP cluster. Good understanding of OCP or any container orchestration technology( Kubernetes), fair understanding of Kubernetes resource management commands via OCP. • Resize the OCP workloads based on the requirements, take backups of various AIOps components (Cassandra, postgres, asm). Work with platform teams to provision resources needed and assign the persistent volume class to different operators. • Ability to Architect monitoring tools over OCP Kubernetes Cluster, working experience on IBM AIOps specially with OpenShift will be add on • Worked on providing the design solution for onprem, cloud and hybrid infrastructure environment. • Architect in the space of Monitoring will be responsible to design a truly dynamic monitoring solution using available tools that will help in co-relating issues, predict outages, pull up topologies from different sources and merge it together. • Leverage various AI solutions (Dynatrace/WatsonAIOps) and showcase value to the business in reduction of MTTR and identify root cause with good sense of co-relation of alerts. • Good scripting knowledge • Knowledge of relevant protocols and systems coupled with experience with incident response and resolution. • Strong analytical skills, ability to interpret the data trends and a proactive approach to identify and address the issue. • Hands-on experience with business requirements gathering/analysis. • Proven experience in systems and network design and development. • Excellent architecture and technical support documentation skills

Education

ANY GRADUATE