Home / datastore
Name Modified Size InfoDownloads / Week
Parent folder
sample_docs 2013-02-03
Totals: 1 Item   0
--ACKNOWLEDGMENTS-- 

OPTIMA is the result of the PhD work of Andreas Vlachidis, supervised by Douglas Tudhope University of Glamorgan UK.  OPTIMA was employed by the Semantic Technologies for Archaeological Resources (STAR) project  (http://hypermedia.research.glam.ac.uk/kos/star/) for the purposes of semantic indexing of archaeological grey-literature and cross-searching between reports and disparate archeaological datasets. The complete results of the semantic indexing effort can be found in (http://www.andronikos.co.uk/). 

--SUMMARY--
OPTIMA is a semantic annotation pipeline aimed at delivering contextual abstractions with respect to the ontological models CIDOC-CRM and its archaeological extension CRM-EH. The pipeline is built in GATE framework and performs the Natural Language Processing tasks of:
 i) Named Entity Recognition 
 ii) Relation Extraction 

Sub-sequent, supportive NLP tasks are also performed by the pipeline such as, domain-independent Preprocessing tasks, Word Sense Disambiguation, Negation Detection, and Adjectival Conjunction.

The pipeline makes use of hand-crafted JAPE rules and terminological resources such as custom gazetteers and English Heritage Thesauri and glossaries. The participating English Heritage Terminological resources are expressed as parametrised GATE gazetteer listings where each gazetteer entry enjoys a unique Simple Knowledge Organization System, SKOS reference (http://www.w3.org/2004/02/skos/)

--SEMANTIC EXPANSION MODE --
The task of NER can be configured to execute in three different modes of semantic expansion;
   i) Synonym
  ii) Hyponym
 iii) Hypernym
Each semantic expansion mode utilises the participating EH thesauri resources in a different scale. Thus, the Synonym expansion mode makes use only of available synonym concepts, the Hyponym mode of the narrower concepts and the Hypernym mode of the broader concepts. More on the modes of semantic expansion and EH usage in OPTIMA can found in the wiki pages of the OPTIMA project (http://sourceforge.net/p/optimacidoc/wiki/Home/) and in Andreas Vlachidis PhD thesis (http://www.andronikos.co.uk/publications.php).     

--SEMANTIC ANNOTATION TYPES --
The pipeline produces the following semantic annotation types 

--Domain Independent--
Heading 
Negation
TOC
Summary

--CIDOC CRM Named Entity Recognition Task--
E19.Physical Object
E49.Time Appellation
E53.Place
E57.Material

--CIDOC CRM Relation Extraction Task--
E9.Move Event
E12.Production Event
E63.Beginning of Existance
P45.Consists of

--CRM-EH Named Entity Recognition Task--
EHE0007.Context
EHE0009.Context Find
EHE0026.Context Event Time Span Appellation
EHE0030.Context Find Material
EHE0039.Context Find Production Event Time Span Appellation

--CRM-EH Relation Extraction Task--
EHE1001.Context Event
EHE1002.Context Find Production Event
EHE1004.Context Find Deposition Event
P45.Consists of Material


-- REQUIREMENTS --

*  The requirements of OPTIMA are dictated by the installation requirements of GATE. You need to have GATE installed in order to execute OPTIMA. The GATE framework can be downloaded from http://gate.ac.uk/download/    

-- INSTALLATION --

Simply Copy and Paste the contents of OPTIMA_cidoc-crm_1.0 folder into a local directory of your choice. 

-- CONFIGURATION AND EXECUTION -- 

The  OPTIMA_cidoc-crm_1.0 archive contains 4 separate pipelines. The “optima_unified_pipeline.xgapp” is the unified version of the rest 3 pipelines (optima_preprocess.xgapp, optima_NER_cidoc_crm.xgapp and optima_RE_crm_and_crm-eh.xgapp). 

The unified version combines together the 3 main phases of the pipeline;
   i) Pre-process
  ii) Named Entity Recognition in CIDOC-CRM 
 iii) Relation Extraction in CRM-EH.  

* The unified version is configured to execute the NER task only in Hypernym mode of semantic expansion.  

-- EXECUTING THE UNIFIED VERSION --
In  the GATE environment load the pipeline “unified_pipeline.xgapp”  from:
File > Restore Application from File

* You can also load a sample corpus of 10 grey-literature documents provided as GATE datastore by the distribution.  To load the datastore select: 

File > Datastores > Open Datastore

in revealed dialog box select the option “Serial Datastore” and select the directory : 

 OPTIMA_cidoc-crm_1.0/datastore/sample_docs

The files are already processed and contain semantic annotations. However you can re-process the corpus as many times you wish, because every time the pipeline line is executed, the existing documents annotation are reset. 

-- EXECUTING THE 3-TIER PHASED VERSION --
In  the GATE environment load the phased pipelines from:
File > Restore Application from File  
and point to the pipeline files from the revealed window.

** The phased versions MUST run in the following order
  i) optima_preprocess.xgapp
 ii) optima_NER_cidoc_crm.xgapp 
iii) optima_RE_crm_and_crm-eh.xgapp

Not following the above order will cause the pipeline to crash !

* Running the phased version gives you the flexibility to execute the different semantic expansion modes for the task of NER.  By default when the “optima_NER_cidoc_crm.xgapp” is loaded the Hypernym Expansion mode is ON and the other two modes are OFF. 

The NER phase executes on a conditional pipeline.  To run the pipeline on the Synonym mode of semantic expansion TURN ON the Synonym mode form the processing resources pipeline view in GATE and OFF the other two. Similarly to run the Hyponym mode of semantic expansion , TURN ON the Hyponym Expansion Resource and OFF the other two (Synonym and Hyponym)

NOTE! It is important that only ONE semantic expansion mode is ON at any given execution time. Having more than one semantic expansion modes activated will cause the pipeline to crash.  


-- CONTACT --

Andreas Vlachidis:
* email : avlachid@glam.ac.uk , avlachid@yahoo.com
* web : http://hypermedia.research.glam.ac.uk/people/vlachidis/
	http://www.andronikos.co.uk



Source: README.txt, updated 2013-02-03