Download Latest Version OPTIMA_cidoc-crm_1.0.zip (4.4 MB)
Email in envelope

Get an email when there's a new version of OPTIMA cidoc-crm Semantic Annotation

Home / plugins / ANNIE / resources / schema
Name Modified Size InfoDownloads / Week
Parent folder
AddressSchema.xml 2012-02-09 649 Bytes
CorefSchema.xml 2012-02-09 875 Bytes
DateSchema.xml 2012-02-09 448 Bytes
EmailSchema.xml 2012-02-09 149 Bytes
EntityHeadSchema.xml 2012-02-09 165 Bytes
EntityMentionSchema.xml 2012-02-09 748 Bytes
EntitySchema.xml 2012-02-09 528 Bytes
FacilitySchema.xml 2012-02-09 154 Bytes
GpeSchema.xml 2012-02-09 147 Bytes
IdentifierSchema.xml 2012-02-09 154 Bytes
InternetSchema.xml 2012-02-09 152 Bytes
LocationSchema.xml 2012-02-09 152 Bytes
MoneySchema.xml 2012-02-09 149 Bytes
OrganizationSchema.xml 2012-02-09 156 Bytes
PercentSchema.xml 2012-02-09 151 Bytes
PersonSchema.xml 2012-02-09 149 Bytes
PhoneSchema.xml 2012-02-09 149 Bytes
PosSchema.xml 2012-02-09 1.2 kB
SentenceSchema.xml 2012-02-09 377 Bytes
StreetSchema.xml 2012-02-09 150 Bytes
SyntaxTreeNodeSchema.xml 2012-02-09 902 Bytes
TokenSchema.xml 2012-02-09 150 Bytes
UtteranceSchema.xml 2012-02-09 395 Bytes
Totals: 23 Items   8.2 kB 0
--ACKNOWLEDGMENTS-- 

OPTIMA is the result of the PhD work of Andreas Vlachidis, supervised by Douglas Tudhope University of Glamorgan UK.  OPTIMA was employed by the Semantic Technologies for Archaeological Resources (STAR) project  (http://hypermedia.research.glam.ac.uk/kos/star/) for the purposes of semantic indexing of archaeological grey-literature and cross-searching between reports and disparate archeaological datasets. The complete results of the semantic indexing effort can be found in (http://www.andronikos.co.uk/). 

--SUMMARY--
OPTIMA is a semantic annotation pipeline aimed at delivering contextual abstractions with respect to the ontological models CIDOC-CRM and its archaeological extension CRM-EH. The pipeline is built in GATE framework and performs the Natural Language Processing tasks of:
 i) Named Entity Recognition 
 ii) Relation Extraction 

Sub-sequent, supportive NLP tasks are also performed by the pipeline such as, domain-independent Preprocessing tasks, Word Sense Disambiguation, Negation Detection, and Adjectival Conjunction.

The pipeline makes use of hand-crafted JAPE rules and terminological resources such as custom gazetteers and English Heritage Thesauri and glossaries. The participating English Heritage Terminological resources are expressed as parametrised GATE gazetteer listings where each gazetteer entry enjoys a unique Simple Knowledge Organization System, SKOS reference (http://www.w3.org/2004/02/skos/)

--SEMANTIC EXPANSION MODE --
The task of NER can be configured to execute in three different modes of semantic expansion;
   i) Synonym
  ii) Hyponym
 iii) Hypernym
Each semantic expansion mode utilises the participating EH thesauri resources in a different scale. Thus, the Synonym expansion mode makes use only of available synonym concepts, the Hyponym mode of the narrower concepts and the Hypernym mode of the broader concepts. More on the modes of semantic expansion and EH usage in OPTIMA can found in the wiki pages of the OPTIMA project (http://sourceforge.net/p/optimacidoc/wiki/Home/) and in Andreas Vlachidis PhD thesis (http://www.andronikos.co.uk/publications.php).     

--SEMANTIC ANNOTATION TYPES --
The pipeline produces the following semantic annotation types 

--Domain Independent--
Heading 
Negation
TOC
Summary

--CIDOC CRM Named Entity Recognition Task--
E19.Physical Object
E49.Time Appellation
E53.Place
E57.Material

--CIDOC CRM Relation Extraction Task--
E9.Move Event
E12.Production Event
E63.Beginning of Existance
P45.Consists of

--CRM-EH Named Entity Recognition Task--
EHE0007.Context
EHE0009.Context Find
EHE0026.Context Event Time Span Appellation
EHE0030.Context Find Material
EHE0039.Context Find Production Event Time Span Appellation

--CRM-EH Relation Extraction Task--
EHE1001.Context Event
EHE1002.Context Find Production Event
EHE1004.Context Find Deposition Event
P45.Consists of Material


-- REQUIREMENTS --

*  The requirements of OPTIMA are dictated by the installation requirements of GATE. You need to have GATE installed in order to execute OPTIMA. The GATE framework can be downloaded from http://gate.ac.uk/download/    

-- INSTALLATION --

Simply Copy and Paste the contents of OPTIMA_cidoc-crm_1.0 folder into a local directory of your choice. 

-- CONFIGURATION AND EXECUTION -- 

The  OPTIMA_cidoc-crm_1.0 archive contains 4 separate pipelines. The “optima_unified_pipeline.xgapp” is the unified version of the rest 3 pipelines (optima_preprocess.xgapp, optima_NER_cidoc_crm.xgapp and optima_RE_crm_and_crm-eh.xgapp). 

The unified version combines together the 3 main phases of the pipeline;
   i) Pre-process
  ii) Named Entity Recognition in CIDOC-CRM 
 iii) Relation Extraction in CRM-EH.  

* The unified version is configured to execute the NER task only in Hypernym mode of semantic expansion.  

-- EXECUTING THE UNIFIED VERSION --
In  the GATE environment load the pipeline “unified_pipeline.xgapp”  from:
File > Restore Application from File

* You can also load a sample corpus of 10 grey-literature documents provided as GATE datastore by the distribution.  To load the datastore select: 

File > Datastores > Open Datastore

in revealed dialog box select the option “Serial Datastore” and select the directory : 

 OPTIMA_cidoc-crm_1.0/datastore/sample_docs

The files are already processed and contain semantic annotations. However you can re-process the corpus as many times you wish, because every time the pipeline line is executed, the existing documents annotation are reset. 

-- EXECUTING THE 3-TIER PHASED VERSION --
In  the GATE environment load the phased pipelines from:
File > Restore Application from File  
and point to the pipeline files from the revealed window.

** The phased versions MUST run in the following order
  i) optima_preprocess.xgapp
 ii) optima_NER_cidoc_crm.xgapp 
iii) optima_RE_crm_and_crm-eh.xgapp

Not following the above order will cause the pipeline to crash !

* Running the phased version gives you the flexibility to execute the different semantic expansion modes for the task of NER.  By default when the “optima_NER_cidoc_crm.xgapp” is loaded the Hypernym Expansion mode is ON and the other two modes are OFF. 

The NER phase executes on a conditional pipeline.  To run the pipeline on the Synonym mode of semantic expansion TURN ON the Synonym mode form the processing resources pipeline view in GATE and OFF the other two. Similarly to run the Hyponym mode of semantic expansion , TURN ON the Hyponym Expansion Resource and OFF the other two (Synonym and Hyponym)

NOTE! It is important that only ONE semantic expansion mode is ON at any given execution time. Having more than one semantic expansion modes activated will cause the pipeline to crash.  


-- CONTACT --

Andreas Vlachidis:
* email : avlachid@glam.ac.uk , avlachid@yahoo.com
* web : http://hypermedia.research.glam.ac.uk/people/vlachidis/
	http://www.andronikos.co.uk



Source: README.txt, updated 2013-02-03