Download Latest Version OPTIMA_cidoc-crm_1.0.zip (4.4 MB)
Email in envelope

Get an email when there's a new version of OPTIMA cidoc-crm Semantic Annotation

Home / datastore / sample_docs / gate.corpora.DocumentImpl
Name Modified Size InfoDownloads / Week
Parent folder
wessexar1-5680.txt_00051___1359159600329___5321 2013-02-03 18.0 kB
wessexar1-25626_1.txt_00050___1359159600315___2182 2013-02-03 40.4 kB
suffolkc1-6115.txt_0004F___1359159600281___5176 2013-02-03 128.9 kB
heritage1-11948.txt_0004E___1359159600252___3522 2013-02-03 36.8 kB
heritage1-10767.txt_0004D___1359159600237___2579 2013-02-03 33.6 kB
foundati1-5205.txt_0004C___1359159600224___2187 2013-02-03 21.6 kB
essexcou1-5166.txt_0004B___1359159600162___4597 2013-02-03 93.1 kB
essexcou1-10460.txt_0004A___1359159600128___2329 2013-02-03 20.7 kB
birmingh2-28160_1.txt_00049___1359159600083___4776 2013-02-03 58.1 kB
aocarcha1-11167_1.txt_00047___1359159599933___8015 2013-02-03 134.0 kB
aocarcha1-11167_1.txt~_00048___1359159600048___4308 2013-02-03 1.5 MB
wessexar1-5680.pdf_00046___1359159542778___4357 2013-01-26 9.1 kB
wessexar1-25626_1.pdf_00045___1359159542209___6503 2013-01-26 41.6 kB
suffolkc1-6115.pdf_00044___1359159535527___9166 2013-01-26 2.1 kB
heritage1-11948.pdf_00043___1359159533013___902 2013-01-26 19.8 kB
heritage1-10767.pdf_00042___1359159530558___2814 2013-01-26 22.4 kB
foundati1-5205.pdf_00041___1359159528504___2842 2013-01-26 9.2 kB
essexcou1-5166.pdf_00040___1359159527679___5567 2013-01-26 61.9 kB
essexcou1-10460.pdf_0003F___1359159512099___3806 2013-01-26 13.8 kB
birmingh2-28160_1.pdf_0003E___1359159509211___5017 2013-01-26 35.5 kB
aocarcha1-11167_1.pdf_00039___1359159491557___4337 2013-01-26 157.5 kB
Totals: 21 Items   2.5 MB 0
--ACKNOWLEDGMENTS-- 

OPTIMA is the result of the PhD work of Andreas Vlachidis, supervised by Douglas Tudhope University of Glamorgan UK.  OPTIMA was employed by the Semantic Technologies for Archaeological Resources (STAR) project  (http://hypermedia.research.glam.ac.uk/kos/star/) for the purposes of semantic indexing of archaeological grey-literature and cross-searching between reports and disparate archeaological datasets. The complete results of the semantic indexing effort can be found in (http://www.andronikos.co.uk/). 

--SUMMARY--
OPTIMA is a semantic annotation pipeline aimed at delivering contextual abstractions with respect to the ontological models CIDOC-CRM and its archaeological extension CRM-EH. The pipeline is built in GATE framework and performs the Natural Language Processing tasks of:
 i) Named Entity Recognition 
 ii) Relation Extraction 

Sub-sequent, supportive NLP tasks are also performed by the pipeline such as, domain-independent Preprocessing tasks, Word Sense Disambiguation, Negation Detection, and Adjectival Conjunction.

The pipeline makes use of hand-crafted JAPE rules and terminological resources such as custom gazetteers and English Heritage Thesauri and glossaries. The participating English Heritage Terminological resources are expressed as parametrised GATE gazetteer listings where each gazetteer entry enjoys a unique Simple Knowledge Organization System, SKOS reference (http://www.w3.org/2004/02/skos/)

--SEMANTIC EXPANSION MODE --
The task of NER can be configured to execute in three different modes of semantic expansion;
   i) Synonym
  ii) Hyponym
 iii) Hypernym
Each semantic expansion mode utilises the participating EH thesauri resources in a different scale. Thus, the Synonym expansion mode makes use only of available synonym concepts, the Hyponym mode of the narrower concepts and the Hypernym mode of the broader concepts. More on the modes of semantic expansion and EH usage in OPTIMA can found in the wiki pages of the OPTIMA project (http://sourceforge.net/p/optimacidoc/wiki/Home/) and in Andreas Vlachidis PhD thesis (http://www.andronikos.co.uk/publications.php).     

--SEMANTIC ANNOTATION TYPES --
The pipeline produces the following semantic annotation types 

--Domain Independent--
Heading 
Negation
TOC
Summary

--CIDOC CRM Named Entity Recognition Task--
E19.Physical Object
E49.Time Appellation
E53.Place
E57.Material

--CIDOC CRM Relation Extraction Task--
E9.Move Event
E12.Production Event
E63.Beginning of Existance
P45.Consists of

--CRM-EH Named Entity Recognition Task--
EHE0007.Context
EHE0009.Context Find
EHE0026.Context Event Time Span Appellation
EHE0030.Context Find Material
EHE0039.Context Find Production Event Time Span Appellation

--CRM-EH Relation Extraction Task--
EHE1001.Context Event
EHE1002.Context Find Production Event
EHE1004.Context Find Deposition Event
P45.Consists of Material


-- REQUIREMENTS --

*  The requirements of OPTIMA are dictated by the installation requirements of GATE. You need to have GATE installed in order to execute OPTIMA. The GATE framework can be downloaded from http://gate.ac.uk/download/    

-- INSTALLATION --

Simply Copy and Paste the contents of OPTIMA_cidoc-crm_1.0 folder into a local directory of your choice. 

-- CONFIGURATION AND EXECUTION -- 

The  OPTIMA_cidoc-crm_1.0 archive contains 4 separate pipelines. The “optima_unified_pipeline.xgapp” is the unified version of the rest 3 pipelines (optima_preprocess.xgapp, optima_NER_cidoc_crm.xgapp and optima_RE_crm_and_crm-eh.xgapp). 

The unified version combines together the 3 main phases of the pipeline;
   i) Pre-process
  ii) Named Entity Recognition in CIDOC-CRM 
 iii) Relation Extraction in CRM-EH.  

* The unified version is configured to execute the NER task only in Hypernym mode of semantic expansion.  

-- EXECUTING THE UNIFIED VERSION --
In  the GATE environment load the pipeline “unified_pipeline.xgapp”  from:
File > Restore Application from File

* You can also load a sample corpus of 10 grey-literature documents provided as GATE datastore by the distribution.  To load the datastore select: 

File > Datastores > Open Datastore

in revealed dialog box select the option “Serial Datastore” and select the directory : 

 OPTIMA_cidoc-crm_1.0/datastore/sample_docs

The files are already processed and contain semantic annotations. However you can re-process the corpus as many times you wish, because every time the pipeline line is executed, the existing documents annotation are reset. 

-- EXECUTING THE 3-TIER PHASED VERSION --
In  the GATE environment load the phased pipelines from:
File > Restore Application from File  
and point to the pipeline files from the revealed window.

** The phased versions MUST run in the following order
  i) optima_preprocess.xgapp
 ii) optima_NER_cidoc_crm.xgapp 
iii) optima_RE_crm_and_crm-eh.xgapp

Not following the above order will cause the pipeline to crash !

* Running the phased version gives you the flexibility to execute the different semantic expansion modes for the task of NER.  By default when the “optima_NER_cidoc_crm.xgapp” is loaded the Hypernym Expansion mode is ON and the other two modes are OFF. 

The NER phase executes on a conditional pipeline.  To run the pipeline on the Synonym mode of semantic expansion TURN ON the Synonym mode form the processing resources pipeline view in GATE and OFF the other two. Similarly to run the Hyponym mode of semantic expansion , TURN ON the Hyponym Expansion Resource and OFF the other two (Synonym and Hyponym)

NOTE! It is important that only ONE semantic expansion mode is ON at any given execution time. Having more than one semantic expansion modes activated will cause the pipeline to crash.  


-- CONTACT --

Andreas Vlachidis:
* email : avlachid@glam.ac.uk , avlachid@yahoo.com
* web : http://hypermedia.research.glam.ac.uk/people/vlachidis/
	http://www.andronikos.co.uk



Source: README.txt, updated 2013-02-03