Nestor

Machine-augmented annotation for technical text

You can do it; your machine can help.

Purpose

Nestor is a toolkit for using Natural Language Processing (NLP) with efficient user-interaction to perform structured data extraction with minimal annotation time-cost.

The Problem

NLP in technical domains requires context sensitivity. Whether for medical notes, engineering work-orders, or social/behavioral coding, experts often use specialized vocabulary with over-loaded meanings and jargon. This is incredibly difficult for off-the-shelf NLP systems to parse through.

The common solution is to contextualize and adapt NLP models to technical text -- Technical Language Processing (TLP)¹. For instance, medical research has been greatly advanced with the advent of labeled, bio-specific datasets, which have domain-relevant named-entity tags and vocabulary sets. Unfortunately for analysts of these types of data, creating resources like this is incredibly time consuming. This is where nestor comes in.

Why Maintenance and Manufacturing?

A reader may notice a heavy focus on maintenance and manufacturing in the Nestor documentation and design. While this is a common problem in technical domains, generally, Nestor got its start in manufacturing data analysis. A large amount of maintenance data is already available for use in advanced manufacturing systems, but in a currently-unusable form: service tickets and maintenance work orders (MWOs).

For further reading, see ² ³ ¹.

Quick Links

Nestor and all of it's associated gui's/projects are in the public domain (see the License). For more information and to provide feedback, please open an issue, submit a pull-request, or email us at nestor@nist.gov.

How does it work?

See the Getting Started page.

This application was originally designed to help manufacturers "tag" their maintenance work-order data according to the methods being researched by the Knowledge Extraction and Applications project at NIST. The goal is to help build context-rich labels in data sets that previously were too unstructured or filled with jargon to analyze. The current build is in very early alpha, so please be patient in using this application. If you have any questions, please do not hesitate to contact us (see Who are we?. )

Rank keywords found in your data by importance, saving you time
Suggest term unification by similarity (e.g. spelling), for quick review
Basic entity relationship builder, to assist assembling problem code and taxonomy definitions
Strucutred data output as named-entity tags, whether in readable (comma-sep) or computation-friendly (sparse-mat) form.

Planned:

Customizable entity types and rules,
Export to NER training formats,
Command-line app and REST API.

Who are we?

This toolkit is a part of the Knowledge Extraction and Application for Smart Manufacturing (KEA) project, within the Systems Integration Division at NIST.

Projects that use Nestor

Various Nestor GUIs: ways to use the full human-centered Nestor workflow in a user-interface.
nestor-eda: (exploratory data analysis): things to do with Nestor-annotated data (dashboard, viz, etc.)

Points of Contact

Email the development team at nestor@nist.gov
Thurston Sexton @tbsexton Nestor Technical Lead, Associate Project Leader
Michael Brundage Project Leader

Why "KEA"?

The KEA project seeks to better frame data collection and transformation systems within smart manufacturing as collaborations between human experts and the machines they partner with, to more efficiently utilize the digital and human resources available to manufacturers. Kea (nestor notabilis) on the other hand, are the world's only alpine parrots, finding their home on the southern Island of NZ. Known for their intelligence and ability to solve puzzles through the use of tools, they will often work together to reach their goals, which is especially important in their harsh, mountainous habitat.

Development/Contribution Guidelines

More to come, but primary requirement is the use of Poetry. Plugins are installed as development dependencies through poetry (e.g. taskipy and poetry-dynamic-versioning), though if not using conda environments, poetry-dynamic-versioning may require being installed to the global python installation.

Notebooks should be kept nicely git-friendly with Jupytext

Other Tools/Resources

Know of other tools? Or want to find similar resources as Nestor? A community driven TLP Community of Interest (COI) has been created to provide publicly available resources to the community. Check out our awesomelist.

Michael P Brundage, Thurston Sexton, Melinda Hodkiewicz, Alden Dima, and Sarah Lukens. Technical language processing: unlocking maintenance knowledge. Manufacturing Letters, 2020. ↩↩
Thurston Sexton, Michael P Brundage, Michael Hoffman, and Katherine C Morris. Hybrid datafication of maintenance logs from ai-assisted human tags. In Big Data $Big Data$ , 2017 IEEE International Conference on, 1769–1777. IEEE, 2017. ↩
Michael Sharp, Thurston Sexton, and Michael P Brundage. Toward semi-autonomous information. In IFIP International Conference on Advances in Production Management Systems, 425–432. Springer, 2017. ↩