The gold-standard and automatically-developed fine-grained Arabic named entity corpora are resources created by annotating Named Entities into 50 fine-grained classes.
The annotation uses two-levels taxonomy in which an entity has been annotated into coarse- and fine-grained classes.
A) Manually gold-standard:
1) WikiFANE_Gold: Gold standard Wikipedia-based Fine-grained Arabic Named Entity Corpus, ~500K tokens
and
2) NewsFANE_Gold: Gold standard Newswire-based Fine-grained Arabic Named Entity Corpus, ~170K tokens.
Those corpora have been manually annotated from the Arabic Wikipedia and Newswire sources respectively.
B) Automatically-developed:
1) WikiFANE_Whole: All sentences of the Arabic Wikipedia articles were retrieved to compile to corpus. ~2M tokens.
2) WikiFANE_Selective: Sentences which have at least one NE phrase were retrieved to compile the corpus. ~2M tokens.
Author URL:
http://www.cs.bham.ac.uk/~fsa081/index.html
http://fsalotaibi.kau.edu.sa
Fine-grained Arabic Named Entity Corpora
Fine-grained Arabic Named Entity Corpora
Brought to you by:
fsalotaibi
Downloads:
0 This Week