Fine-grained Arabic Named Entity Corpora download

The gold-standard and automatically-developed fine-grained Arabic named entity corpora are resources created by annotating Named Entities into 50 fine-grained classes.

The annotation uses two-levels taxonomy in which an entity has been annotated into coarse- and fine-grained classes.

A) Manually gold-standard:

1) WikiFANE_Gold: Gold standard Wikipedia-based Fine-grained Arabic Named Entity Corpus, ~500K tokens
and
2) NewsFANE_Gold: Gold standard Newswire-based Fine-grained Arabic Named Entity Corpus, ~170K tokens.

Those corpora have been manually annotated from the Arabic Wikipedia and Newswire sources respectively.

B) Automatically-developed:

1) WikiFANE_Whole: All sentences of the Arabic Wikipedia articles were retrieved to compile to corpus. ~2M tokens.

2) WikiFANE_Selective: Sentences which have at least one NE phrase were retrieved to compile the corpus. ~2M tokens.

Author URL:
http://www.cs.bham.ac.uk/~fsa081/index.html
http://fsalotaibi.kau.edu.sa

Project Activity

See All Activity >

License

Creative Commons Attribution ShareAlike License V3.0

Follow Fine-grained Arabic Named Entity Corpora

Fine-grained Arabic Named Entity Corpora Web Site

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free

Rate This Project

User Reviews

Be the first to post a review of Fine-grained Arabic Named Entity Corpora!

Additional Project Details

Registered

2014-06-12

Report inappropriate content

Fine-grained Arabic Named Entity Corpora