UN Information Extraction and Knowledge

Description

 

Our solution has been developed on Apache Stanbol RESTful Semantic Engine, which is part of the open source Apache Software Foundation. Its main purpose is semantic content management and it helps in extending traditional content management systems with semantic services, which aligns directly with the objectives of this challenge. Their official website is: http://stanbol.apache.org/

Stanbol allows us to add our own ontology, index it, and add it as an enhancement engine along with other OpenNLP and DBpedia engines developed by the community.

 

The OpenNLP engines help with basic NLP tasks like NER, POS, Tokenisation etc. and DBPedia fetches relevant annotations from Wikipedia. Along with these, we added the undo.owl ontology from https://github.com/UNSCEB-HLCM/undo/tree/master/ontology

 

All these engines can be run together in the form of an "Enhancement Chain". This is the feature that makes Stanbol stand out. Here's a demo of all the engines in action: https://unga.mido.io. On clicking the name of the chain, you can see the engines it's made up of. After the engines finish running if you click the name of the chain you can see the time for which each chain ran.

 

This is one of the UNGA documents we've used to test the results:

https://www.un.org/en/ga/search/view_doc.asp?symbol=A/RES/62/278

 

Here are a series of steps to try it out:

1 Go to https://unga.mido.io

  On the main page you'll find general information about Stanbol.

2 In the nav bar, click on the "/enhancer" link.

3 In another tab, open a test resolution:

https://www.un.org/en/ga/search/view_doc.asp?symbol=A/RES/62/278

4 Select all and copy.

5 Switch to the Stanbol tab. Paste the text on your clipboard into the text area.

6 Click the "Run engines" button.

 

In the result you'll see a list of extracted entities, such as "Social Council", "Member States", etc. These entities are identified and extracted by the aforementioned "Enhancement Chains". To see the active chains, either click on the "Enhancement Chains" and link at the upper right, or go directly to: http://unga.mido.io/enhancer/chain. Here you will see all available chains. The five UN-related chains are listed at the end:

• undo+dbpedia-disambiguation ( id: 258, ranking: 0, impl: ListChain )

• undo+dbpedia-fst-linking ( id: 255, ranking: 0, impl: ListChain )

• undo+dbpedia-named-entity-linking ( id: 259, ranking: 0, impl: ListChain )

• undo+dbpedia-proper-noun ( id: 256, ranking: 0, impl: ListChain )

• undo-plain ( id: 257, ranking: 0, impl: ListChain )

 

Once you test and identify the right chain for your purpose, you can upload the files in this page - https://mido22.github.io/un-stanbol/ - and get the output/annotations in the desired format.

 

We hope that you will find this useful. Should you want to investigate further, it's fairly easy to install Stanbol locally and import the UNDO as well as other ontologies.

Co-authors to your solution

Abhimanyu Sarvagyam, Kit Blake, Varun Subramanian

Link to your concept design and documentation:

https://stanbol.apache.org/

Link to an online working solution or prototype:

https://unga.mido.io/

Link to a video or screencast of your solution or prototype:

https://padlet.com/abhimanyu_sarvagyam/UngaExtraction

Link to source code of your solution or prototype above:

https://github.com/sarvagyaa/unga-extraction

Tags: ontologies,UNGA