So since I haven’t really developed that much (lots of stuff in the works), thought I would just riff a bit about some current thoughts related to using Natural Language Processing and/or Machine Learning to create micromaterials (…but mostly NLP).
I’ve been playing around with the Stanford CoreNLP library (which also means I’m now firmly in Java territory), and getting to know it a bit better, and also reading a lot of interesting papers related to Knowledge Graph creation (eg, this one).
Essentially the TL;DR of Knowledge Graphs is that we have nodes (people/places/things) with attributes (dates/places/associations) and edges (relationships between these nodes). If we could use the Stanford CoreNLP library’s Named Entity Recognition annotator to parse out all the proper nouns from, say, a Wikipedia article, we could get a rough approximation of some nodes.
If we then used Stanford’s openIE package (which is part of their CoreNLP library) to build up a Knowledge Graph around these nodes, we would potentially have some very simple “facts”, for example, Person A was born in Place B in Year C (as just one possible example).
Once these “facts” can be represented in a machine readable format, we can manipulate the structures to create questions. If we were to parse the entirety of the wikipedia page on Lord Sugar, a reasonably simple question to formulate could be something like “When did Lord Sugar have an airplane accident?”
Admittedly, this would be a fairly simple question to answer, though numerous such questions, maybe with slight paraphrasing, could serve as a convenient comprehension check for students while reading an article.
The true power of this approach would lie in its total automation, which would rest mostly on the assumptions of accuracy of the entire pipeline from reader selecting a Wikipedia article, through NLP annotations, and finally to selection/formulation of questions by the system…which is definitely a lot to assume.
In theory reader feedback on which questions are appropriate (or even above the bar of “not gibberish”) could be fed into a machine learning model to iteratively improve the question selection, though this is way way beyond my abilities…whereas up to now it was only speculation on things that are way beyond my abilities.