Description

This role will work very close with the Sr. Engineer on the team to clean up the database and ensure data ethnicity and validity.

End goal of this role/ project – clean up that databases. Cleaning up low hanging names that are not linked correctly in the database, determining if naming conventions need to be linked together or not, to ensure accurate data matches. Currently the security alerting system is not fully providing all of the security matches and proper licensing requirements and expirations notifications – because there are multiple data/ naming conventions in the system that are not correct, therefore there are no notifications being generated.

  • Looking for someone that is not only a self-starter but a self-learner that can think outside of the box. – someone that feels comfortable and confident to bring up ideas around what they can do to improve the quality and/or processes.
  • Must be a good communicator and researcher – need to really dig in and come up with creative solutions. This role will be A LOT of researching!
  • Responsible for pulling data from the API in Python, doing Data analysis on what comes back from the API.
  • This role will be focused on natural language processing. Example – how names can be matched to each other and knowing how to do that kind of work in Python.
  • Need to be good at explaining thought processes, researching, presenting findings their case and the data/ research behind it.

Full Job Description

Data Analyst/ Engineer - Python 

  • Collaborate with team members to analyze data 
  • Maintain database/metadata and adherence to conventions and data governance 
  • Clean up current backlogged data management items 
  • Eventually, become an SME in the field of software component analysis 
  • Work with leadership to identify current data management issues and opportunities for improvement in the current process used to ratify data 
  • Devise and implement solutions to speed up workflow in data cleaning in Python 

  Qualifications - Required 

  • Intermediate practical knowledge of using regex and fuzzy matching to do string similarity mapping
  • 3+ years of hands-on experience with data analysis tools including Python using the Pandas module 
  • 3+ years of experience using APIs in Python, including managing unstructured data 
  • Experience using secured APIs (credentialed/tokenized) 
  • Intermediate practical experience using command-line Linux

Qualifications - Desired 

  • Bachelor's degree in science, engineering, statistics, mathematics, economics, or other fields related to the position 
  • Intermediate practical general knowledge of Linux, including commonly used software packages and Distros/Operating Systems 
  • Experience with MongoDB or other NoSQL databases is a plus

Education

Bachelor's degree