Download Latest Version OPTIMA_cidoc-crm_1.0.zip (4.4 MB)
Email in envelope

Get an email when there's a new version of OPTIMA cidoc-crm Semantic Annotation

Home / plugins / ANNIE / resources / gazetteer
Name Modified Size InfoDownloads / Week
Parent folder
abbreviations.lst 2012-02-09 26 Bytes
adbc.lst 2012-02-09 40 Bytes
airport.lst 2012-02-09 55 Bytes
airports.lst 2012-02-09 66 Bytes
cdg.lst 2012-02-09 946 Bytes
charities.lst 2012-02-09 377 Bytes
city.lst 2012-02-09 18.7 kB
city_cap.lst 2012-02-09 18.6 kB
city_uk.lst 2012-02-09 258.3 kB
colours.lst 2012-02-09 105 Bytes
company.lst 2012-02-09 39.8 kB
company_cap.lst 2012-02-09 39.3 kB
country.lst 2012-02-09 4.9 kB
country_abbrev.lst 2012-02-09 3 Bytes
country_adj.lst 2012-02-09 8.1 kB
country_cap.lst 2012-02-09 4.9 kB
currency_prefix.lst 2012-02-09 43 Bytes
currency_unit.lst 2012-02-09 1.9 kB
date.lst 2012-02-09 128 Bytes
date_key.lst 2012-02-09 24 Bytes
date_post.lst 2012-02-09 629 Bytes
date_pre.lst 2012-02-09 49 Bytes
date_unit.lst 2012-02-09 246 Bytes
datespan.lst 2012-02-09 60 Bytes
day.lst 2012-02-09 154 Bytes
day_cap.lst 2012-02-09 155 Bytes
department.lst 2012-02-09 513 Bytes
determiner.lst 2012-02-09 19 Bytes
facility.lst 2012-02-09 347 Bytes
facility_key.lst 2012-02-09 564 Bytes
facility_key_ext.lst 2012-02-09 41 Bytes
festival.lst 2012-02-09 2.0 kB
govern_key.lst 2012-02-09 200 Bytes
government.lst 2012-02-09 2.3 kB
greeting.lst 2012-02-09 70 Bytes
hour.lst 2012-02-09 72 Bytes
ident_prekey.lst 2012-02-09 93 Bytes
jobtitles.lst 2012-02-09 18.7 kB
lists.def 2012-02-09 2.4 kB
loc_generalkey.lst 2012-02-09 48 Bytes
loc_key.lst 2012-02-09 438 Bytes
loc_prekey.lst 2012-02-09 254 Bytes
loc_prekey_lower.lst 2012-02-09 250 Bytes
loc_relig.lst 2012-02-09 67 Bytes
mapping.def 2012-02-09 361 Bytes
ministry.lst 2012-02-09 455 Bytes
minutes.lst 2012-02-09 88 Bytes
months.lst 2012-02-09 420 Bytes
months_lower.lst 2012-02-09 213 Bytes
mountain.lst 2012-02-09 39 Bytes
new_adj.lst 2012-02-09 7.4 kB
new_cdg.lst 2012-02-09 1.5 kB
newspapers.lst 2012-02-09 385 Bytes
non_company.lst 2012-02-09 667 Bytes
nonspec_date.lst 2012-02-09 40 Bytes
not_org.lst 2012-02-09 40 Bytes
number_fold.lst 2012-02-09 149 Bytes
numbers.lst 2012-02-09 1.2 kB
ordinal.lst 2012-02-09 812 Bytes
org_base.lst 2012-02-09 1.8 kB
org_ending.lst 2012-02-09 662 Bytes
org_key.lst 2012-02-09 1.1 kB
org_key_cap.lst 2012-02-09 1.1 kB
org_pre.lst 2012-02-09 22 Bytes
org_spur.lst 2012-02-09 215 Bytes
organization.lst 2012-02-09 2.6 kB
organization_nouns.lst 2012-02-09 13.6 kB
other_people.lst 2012-02-09 31 Bytes
othorg_key.lst 2012-02-09 16 Bytes
percent.lst 2012-02-09 19 Bytes
person_ambig2.lst 2012-02-09 25 Bytes
person_ambig.lst 2012-02-09 16 Bytes
person_ambig.old.lst 2012-02-09 11.8 kB
person_ambig_lower.lst 2012-02-09 5.8 kB
person_ending.lst 2012-02-09 42 Bytes
person_female.lst 2012-02-09 43.2 kB
person_female_cap.lst 2012-02-09 42.8 kB
person_female_lower.lst 2012-02-09 40.7 kB
person_first.lst 2012-02-09 5.7 kB
person_full.lst 2012-02-09 338 Bytes
person_male.lst 2012-02-09 30.8 kB
person_male_cap.lst 2012-02-09 30.1 kB
person_male_lower.lst 2012-02-09 28.3 kB
person_relig.lst 2012-02-09 116 Bytes
person_sci.lst 2012-02-09 26 Bytes
person_spur.lst 2012-02-09 24 Bytes
phone_prefix.lst 2012-02-09 61 Bytes
planet.lst 2012-02-09 67 Bytes
province.lst 2012-02-09 9.7 kB
province_aa.lst 2012-02-09 8.0 kB
province_ab.lst 2012-02-09 1.5 kB
racecourse.lst 2012-02-09 39 Bytes
region.lst 2012-02-09 1.0 kB
region_cap.lst 2012-02-09 966 Bytes
region_uk.lst 2012-02-09 1.7 kB
rivers.lst 2012-02-09 19 Bytes
sports.lst 2012-02-09 124 Bytes
spur.lst 2012-02-09 839 Bytes
spur_ident.lst 2012-02-09 7 Bytes
stop.lst 2012-02-09 133 Bytes
street.lst 2012-02-09 51 Bytes
surname_prefix.lst 2012-02-09 38 Bytes
team.lst 2012-02-09 128 Bytes
time.lst 2012-02-09 46 Bytes
time_ampm.lst 2012-02-09 36 Bytes
time_key.lst 2012-02-09 112 Bytes
time_modifier.lst 2012-02-09 63 Bytes
time_unit.lst 2012-02-09 72 Bytes
times.lst 2012-02-09 254 Bytes
timespan.lst 2012-02-09 32 Bytes
timex_pre.lst 2012-02-09 166 Bytes
timezone.lst 2012-02-09 823 Bytes
title.lst 2012-02-09 2.2 kB
title_female.lst 2012-02-09 79 Bytes
title_lower.lst 2012-02-09 2.5 kB
title_lowercase.lst 2012-02-09 1.1 kB
title_male.lst 2012-02-09 59 Bytes
title_mil.lst 2012-02-09 3.6 kB
title_pol.lst 2012-02-09 890 Bytes
tvcompany.lst 2012-02-09 81 Bytes
water.lst 2012-02-09 158 Bytes
year.lst 2012-02-09 195 Bytes
Totals: 122 Items   737.7 kB 0
--ACKNOWLEDGMENTS-- 

OPTIMA is the result of the PhD work of Andreas Vlachidis, supervised by Douglas Tudhope University of Glamorgan UK.  OPTIMA was employed by the Semantic Technologies for Archaeological Resources (STAR) project  (http://hypermedia.research.glam.ac.uk/kos/star/) for the purposes of semantic indexing of archaeological grey-literature and cross-searching between reports and disparate archeaological datasets. The complete results of the semantic indexing effort can be found in (http://www.andronikos.co.uk/). 

--SUMMARY--
OPTIMA is a semantic annotation pipeline aimed at delivering contextual abstractions with respect to the ontological models CIDOC-CRM and its archaeological extension CRM-EH. The pipeline is built in GATE framework and performs the Natural Language Processing tasks of:
 i) Named Entity Recognition 
 ii) Relation Extraction 

Sub-sequent, supportive NLP tasks are also performed by the pipeline such as, domain-independent Preprocessing tasks, Word Sense Disambiguation, Negation Detection, and Adjectival Conjunction.

The pipeline makes use of hand-crafted JAPE rules and terminological resources such as custom gazetteers and English Heritage Thesauri and glossaries. The participating English Heritage Terminological resources are expressed as parametrised GATE gazetteer listings where each gazetteer entry enjoys a unique Simple Knowledge Organization System, SKOS reference (http://www.w3.org/2004/02/skos/)

--SEMANTIC EXPANSION MODE --
The task of NER can be configured to execute in three different modes of semantic expansion;
   i) Synonym
  ii) Hyponym
 iii) Hypernym
Each semantic expansion mode utilises the participating EH thesauri resources in a different scale. Thus, the Synonym expansion mode makes use only of available synonym concepts, the Hyponym mode of the narrower concepts and the Hypernym mode of the broader concepts. More on the modes of semantic expansion and EH usage in OPTIMA can found in the wiki pages of the OPTIMA project (http://sourceforge.net/p/optimacidoc/wiki/Home/) and in Andreas Vlachidis PhD thesis (http://www.andronikos.co.uk/publications.php).     

--SEMANTIC ANNOTATION TYPES --
The pipeline produces the following semantic annotation types 

--Domain Independent--
Heading 
Negation
TOC
Summary

--CIDOC CRM Named Entity Recognition Task--
E19.Physical Object
E49.Time Appellation
E53.Place
E57.Material

--CIDOC CRM Relation Extraction Task--
E9.Move Event
E12.Production Event
E63.Beginning of Existance
P45.Consists of

--CRM-EH Named Entity Recognition Task--
EHE0007.Context
EHE0009.Context Find
EHE0026.Context Event Time Span Appellation
EHE0030.Context Find Material
EHE0039.Context Find Production Event Time Span Appellation

--CRM-EH Relation Extraction Task--
EHE1001.Context Event
EHE1002.Context Find Production Event
EHE1004.Context Find Deposition Event
P45.Consists of Material


-- REQUIREMENTS --

*  The requirements of OPTIMA are dictated by the installation requirements of GATE. You need to have GATE installed in order to execute OPTIMA. The GATE framework can be downloaded from http://gate.ac.uk/download/    

-- INSTALLATION --

Simply Copy and Paste the contents of OPTIMA_cidoc-crm_1.0 folder into a local directory of your choice. 

-- CONFIGURATION AND EXECUTION -- 

The  OPTIMA_cidoc-crm_1.0 archive contains 4 separate pipelines. The “optima_unified_pipeline.xgapp” is the unified version of the rest 3 pipelines (optima_preprocess.xgapp, optima_NER_cidoc_crm.xgapp and optima_RE_crm_and_crm-eh.xgapp). 

The unified version combines together the 3 main phases of the pipeline;
   i) Pre-process
  ii) Named Entity Recognition in CIDOC-CRM 
 iii) Relation Extraction in CRM-EH.  

* The unified version is configured to execute the NER task only in Hypernym mode of semantic expansion.  

-- EXECUTING THE UNIFIED VERSION --
In  the GATE environment load the pipeline “unified_pipeline.xgapp”  from:
File > Restore Application from File

* You can also load a sample corpus of 10 grey-literature documents provided as GATE datastore by the distribution.  To load the datastore select: 

File > Datastores > Open Datastore

in revealed dialog box select the option “Serial Datastore” and select the directory : 

 OPTIMA_cidoc-crm_1.0/datastore/sample_docs

The files are already processed and contain semantic annotations. However you can re-process the corpus as many times you wish, because every time the pipeline line is executed, the existing documents annotation are reset. 

-- EXECUTING THE 3-TIER PHASED VERSION --
In  the GATE environment load the phased pipelines from:
File > Restore Application from File  
and point to the pipeline files from the revealed window.

** The phased versions MUST run in the following order
  i) optima_preprocess.xgapp
 ii) optima_NER_cidoc_crm.xgapp 
iii) optima_RE_crm_and_crm-eh.xgapp

Not following the above order will cause the pipeline to crash !

* Running the phased version gives you the flexibility to execute the different semantic expansion modes for the task of NER.  By default when the “optima_NER_cidoc_crm.xgapp” is loaded the Hypernym Expansion mode is ON and the other two modes are OFF. 

The NER phase executes on a conditional pipeline.  To run the pipeline on the Synonym mode of semantic expansion TURN ON the Synonym mode form the processing resources pipeline view in GATE and OFF the other two. Similarly to run the Hyponym mode of semantic expansion , TURN ON the Hyponym Expansion Resource and OFF the other two (Synonym and Hyponym)

NOTE! It is important that only ONE semantic expansion mode is ON at any given execution time. Having more than one semantic expansion modes activated will cause the pipeline to crash.  


-- CONTACT --

Andreas Vlachidis:
* email : avlachid@glam.ac.uk , avlachid@yahoo.com
* web : http://hypermedia.research.glam.ac.uk/people/vlachidis/
	http://www.andronikos.co.uk



Source: README.txt, updated 2013-02-03