Nestor
Machine-augmented annotation for technical text
You can do it; your machine can help.
Purpose
Nestor is a toolkit for using Natural Language Processing (NLP) with efficient user-interaction to perform structured data extraction with minimal annotation time-cost.
The Problem
NLP in technical domains requires context sensitivity. Whether for medical notes, engineering work-orders, or social/behavioral coding, experts often use specialized vocabulary with over-loaded meanings and jargon. This is incredibly difficult for off-the-shelf NLP systems to parse through.
The common solution is to contextualize and adapt NLP models to technical text -- Technical Language Processing (TLP)1.
For instance, medical research has been greatly advanced with the advent of labeled, bio-specific datasets, which have domain-relevant named-entity tags and vocabulary sets.
Unfortunately for analysts of these types of data, creating resources like this is incredibly time consuming.
This is where nestor
comes in.
Why Maintenance and Manufacturing?
A reader may notice a heavy focus on maintenance and manufacturing in the Nestor documentation and design. While this is a common problem in technical domains, generally, Nestor got its start in manufacturing data analysis. A large amount of maintenance data is already available for use in advanced manufacturing systems, but in a currently-unusable form: service tickets and maintenance work orders (MWOs).
For further reading, see 2 3 1.
Quick Links
- Get started
- Use a GUI
- Go to our Project Page
Nestor and all of it's associated gui's/projects are in the public domain (see the License). For more information and to provide feedback, please open an issue, submit a pull-request, or email us at nestor@nist.gov.
How does it work?
See the Getting Started page.
This application was originally designed to help manufacturers "tag" their maintenance work-order data according to the methods being researched by the Knowledge Extraction and Applications project at NIST. The goal is to help build context-rich labels in data sets that previously were too unstructured or filled with jargon to analyze. The current build is in very early alpha, so please be patient in using this application. If you have any questions, please do not hesitate to contact us (see Who are we?. )
- Rank keywords found in your data by importance, saving you time
- Suggest term unification by similarity (e.g. spelling), for quick review
- Basic entity relationship builder, to assist assembling problem code and taxonomy definitions
- Strucutred data output as named-entity tags, whether in readable (comma-sep) or computation-friendly (sparse-mat) form.
Planned:
- Customizable entity types and rules,
- Export to NER training formats,
- Command-line app and REST API.
Who are we?
This toolkit is a part of the Knowledge Extraction and Application for Smart Manufacturing (KEA) project, within the Systems Integration Division at NIST.
Projects that use Nestor
- Various Nestor GUIs: ways to use the full human-centered Nestor workflow in a user-interface.
nestor-eda
: (exploratory data analysis): things to do with Nestor-annotated data (dashboard, viz, etc.)
Points of Contact
- Email the development team at nestor@nist.gov
- Thurston Sexton @tbsexton Nestor Technical Lead, Associate Project Leader
- Michael Brundage Project Leader
Why "KEA"?
The KEA project seeks to better frame data collection and transformation systems within smart manufacturing as collaborations between human experts and the machines they partner with, to more efficiently utilize the digital and human resources available to manufacturers. Kea (nestor notabilis) on the other hand, are the world's only alpine parrots, finding their home on the southern Island of NZ. Known for their intelligence and ability to solve puzzles through the use of tools, they will often work together to reach their goals, which is especially important in their harsh, mountainous habitat.
Development/Contribution Guidelines
More to come, but primary requirement is the use of Poetry.
Plugins are installed as development dependencies through poetry (e.g. taskipy
and poetry-dynamic-versioning
), though if not using conda
environments, poetry-dynamic-versioning
may require being installed to the global python installation.
Notebooks should be kept nicely git-friendly with Jupytext
Other Tools/Resources
Know of other tools? Or want to find similar resources as Nestor? A community driven TLP Community of Interest (COI) has been created to provide publicly available resources to the community. Check out our awesomelist.
-
Michael P Brundage, Thurston Sexton, Melinda Hodkiewicz, Alden Dima, and Sarah Lukens. Technical language processing: unlocking maintenance knowledge. Manufacturing Letters, 2020. ↩↩
-
Thurston Sexton, Michael P Brundage, Michael Hoffman, and Katherine C Morris. Hybrid datafication of maintenance logs from ai-assisted human tags. In Big Data Big Data, 2017 IEEE International Conference on, 1769–1777. IEEE, 2017. ↩
-
Michael Sharp, Thurston Sexton, and Michael P Brundage. Toward semi-autonomous information. In IFIP International Conference on Advances in Production Management Systems, 425–432. Springer, 2017. ↩