| Name | Modified | Size | Downloads / Week | 
|---|---|---|---|
| Parent folder | |||
| sample_docs | 2013-02-03 | ||
| Totals: 1 Item | 0 | ||
--ACKNOWLEDGMENTS-- OPTIMA is the result of the PhD work of Andreas Vlachidis, supervised by Douglas Tudhope University of Glamorgan UK. OPTIMA was employed by the Semantic Technologies for Archaeological Resources (STAR) project (http://hypermedia.research.glam.ac.uk/kos/star/) for the purposes of semantic indexing of archaeological grey-literature and cross-searching between reports and disparate archeaological datasets. The complete results of the semantic indexing effort can be found in (http://www.andronikos.co.uk/). --SUMMARY-- OPTIMA is a semantic annotation pipeline aimed at delivering contextual abstractions with respect to the ontological models CIDOC-CRM and its archaeological extension CRM-EH. The pipeline is built in GATE framework and performs the Natural Language Processing tasks of: i) Named Entity Recognition ii) Relation Extraction Sub-sequent, supportive NLP tasks are also performed by the pipeline such as, domain-independent Preprocessing tasks, Word Sense Disambiguation, Negation Detection, and Adjectival Conjunction. The pipeline makes use of hand-crafted JAPE rules and terminological resources such as custom gazetteers and English Heritage Thesauri and glossaries. The participating English Heritage Terminological resources are expressed as parametrised GATE gazetteer listings where each gazetteer entry enjoys a unique Simple Knowledge Organization System, SKOS reference (http://www.w3.org/2004/02/skos/) --SEMANTIC EXPANSION MODE -- The task of NER can be configured to execute in three different modes of semantic expansion; i) Synonym ii) Hyponym iii) Hypernym Each semantic expansion mode utilises the participating EH thesauri resources in a different scale. Thus, the Synonym expansion mode makes use only of available synonym concepts, the Hyponym mode of the narrower concepts and the Hypernym mode of the broader concepts. More on the modes of semantic expansion and EH usage in OPTIMA can found in the wiki pages of the OPTIMA project (http://sourceforge.net/p/optimacidoc/wiki/Home/) and in Andreas Vlachidis PhD thesis (http://www.andronikos.co.uk/publications.php). --SEMANTIC ANNOTATION TYPES -- The pipeline produces the following semantic annotation types --Domain Independent-- Heading Negation TOC Summary --CIDOC CRM Named Entity Recognition Task-- E19.Physical Object E49.Time Appellation E53.Place E57.Material --CIDOC CRM Relation Extraction Task-- E9.Move Event E12.Production Event E63.Beginning of Existance P45.Consists of --CRM-EH Named Entity Recognition Task-- EHE0007.Context EHE0009.Context Find EHE0026.Context Event Time Span Appellation EHE0030.Context Find Material EHE0039.Context Find Production Event Time Span Appellation --CRM-EH Relation Extraction Task-- EHE1001.Context Event EHE1002.Context Find Production Event EHE1004.Context Find Deposition Event P45.Consists of Material -- REQUIREMENTS -- * The requirements of OPTIMA are dictated by the installation requirements of GATE. You need to have GATE installed in order to execute OPTIMA. The GATE framework can be downloaded from http://gate.ac.uk/download/ -- INSTALLATION -- Simply Copy and Paste the contents of OPTIMA_cidoc-crm_1.0 folder into a local directory of your choice. -- CONFIGURATION AND EXECUTION -- The OPTIMA_cidoc-crm_1.0 archive contains 4 separate pipelines. The “optima_unified_pipeline.xgapp” is the unified version of the rest 3 pipelines (optima_preprocess.xgapp, optima_NER_cidoc_crm.xgapp and optima_RE_crm_and_crm-eh.xgapp). The unified version combines together the 3 main phases of the pipeline; i) Pre-process ii) Named Entity Recognition in CIDOC-CRM iii) Relation Extraction in CRM-EH. * The unified version is configured to execute the NER task only in Hypernym mode of semantic expansion. -- EXECUTING THE UNIFIED VERSION -- In the GATE environment load the pipeline “unified_pipeline.xgapp” from: File > Restore Application from File * You can also load a sample corpus of 10 grey-literature documents provided as GATE datastore by the distribution. To load the datastore select: File > Datastores > Open Datastore in revealed dialog box select the option “Serial Datastore” and select the directory : OPTIMA_cidoc-crm_1.0/datastore/sample_docs The files are already processed and contain semantic annotations. However you can re-process the corpus as many times you wish, because every time the pipeline line is executed, the existing documents annotation are reset. -- EXECUTING THE 3-TIER PHASED VERSION -- In the GATE environment load the phased pipelines from: File > Restore Application from File and point to the pipeline files from the revealed window. ** The phased versions MUST run in the following order i) optima_preprocess.xgapp ii) optima_NER_cidoc_crm.xgapp iii) optima_RE_crm_and_crm-eh.xgapp Not following the above order will cause the pipeline to crash ! * Running the phased version gives you the flexibility to execute the different semantic expansion modes for the task of NER. By default when the “optima_NER_cidoc_crm.xgapp” is loaded the Hypernym Expansion mode is ON and the other two modes are OFF. The NER phase executes on a conditional pipeline. To run the pipeline on the Synonym mode of semantic expansion TURN ON the Synonym mode form the processing resources pipeline view in GATE and OFF the other two. Similarly to run the Hyponym mode of semantic expansion , TURN ON the Hyponym Expansion Resource and OFF the other two (Synonym and Hyponym) NOTE! It is important that only ONE semantic expansion mode is ON at any given execution time. Having more than one semantic expansion modes activated will cause the pipeline to crash. -- CONTACT -- Andreas Vlachidis: * email : avlachid@glam.ac.uk , avlachid@yahoo.com * web : http://hypermedia.research.glam.ac.uk/people/vlachidis/ http://www.andronikos.co.uk