arabic corpus free download

Showing 26 open source projects for "arabic corpus"

View related business solutions

Employee monitoring software with screenshots
Clear visibility and insights into how employees work. Even remotely.

Stay productive working at any distance from anywhere with Monitask.

Learn More
An All-in-One EMR Exclusively for Therapy and Rehab.
Electronic Medical Records Software

Managing your therapy and rehab practice is a time-consuming process. You spend hours on paperwork, billing, scheduling, and more. Raintree’s Therapy & Rehab EHR is here to help you manage your practice more efficiently. With our all-in-one solution, you’ll get the tools you need to streamline your therapy and rehab practice, improve patient care, and get back to doing what you love.

Learn More
1

Corpus of Early Arabic Poetry (CEAP)

Downloads: 2 This Week

Last Update: 2022-08-02
See Project
2

Queries-for-Arabic-OSAC-Corpus

43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/

Downloads: 0 This Week

Last Update: 2021-12-03
See Project
3

Linguistic Analyzer

The Linguistic Analyzer is a tool for corpus analysis and comparison

The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.

Downloads: 0 This Week

Last Update: 2022-04-16
See Project
4

KSUCCA Corpus

A 50 million tokens corpus of Classical Arabic.

King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes...

Downloads: 2 This Week

Last Update: 2020-02-19
See Project
Pimberly PIM - the leading enterprise Product Information Management platform.
Pimberly enables businesses to create amazing online experiences with richer, differentiated product descriptions.

Drive amazing product experiences with quality product data.

Learn More
5

agd-text

In this corpus: 10 essays containing 752 sentences (with a total of 4,160 words). The essays were selected from different collections of partially or totally diacritic Arabic texts, all of which are available in the Tashkeela corpus. Texts in this corpus have been used in the evaluation of AGD checker. There are two types of texts in this corpus: 1- Texts without errors to evaluate AGD in terms of detecting and correcting errors that we do not know about before the checking process 2...

Downloads: 0 This Week

Last Update: 2021-02-01
See Project
6

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods...

Downloads: 9 This Week

Last Update: 2019-03-05
See Project
7

Queries for OSAC (Arabic) Corpus

43 Queries for Arabic Information Retrieval Collection

43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/

Downloads: 0 This Week

Last Update: 2019-01-07
See Project
8

Tashkeela: Arabic diacritization corpus

Tashkeela: Arabic discritization Corpus (Vocalized texts)

Tashkeela: Arabic discritization Corpus, Resource, Arabic vocalized texts: نصوص عربية مشكولة =========== Contains Arabic text vocalized . Text -format; 75.6 millions words Please cite this resource as: T. Zerrouki, A. Balla, Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems, Data in Brief (2017), http://dx.doi.org/10.1016/j.dib.2017.01.011 Data in Brief ∎ ( ∎∎∎∎ ) ∎∎∎ – ∎∎∎

1 Review

Downloads: 10 This Week

Last Update: 2018-02-15
See Project
9

Arabic business corpora

Arabic business and management corpus

This corpora is made up of 3 sub corpora as follows: 1) Management Corpus: 400 articles by Chairmans and CEOs of Arabic companies in the Middle East. 2) Economics News: 400 news articles from different Arabic online newspapers. 3) Stock market news, 400 articles collected from investing.com. The main corpora contains 1200 articles. The articles have been tagged using Stanford Arabic Part of Speech Tagger. Both plain text and tagged corpora are available to download, check the Files...

Downloads: 6 This Week

Last Update: 2016-11-01
See Project
EBizCharge Payment Platform for Accounts Receivable
Getting paid has never been easier.

Don’t let unpaid invoices limit your business’s growth. EBizCharge plugs directly into the tools your business already uses to speed up payment collection.

Learn More
10

Classical Arabic Corpus

A corpus contains more than 1 M distinct Arabic words.

This project has been developed as part of a master thesis named "Edit Distance Adapted to Natural Language Words". The available project consists three parts. First, the corpus gathers more than one million distinct Arab words. Second, the text files of Arabic resources. Third, the index file presents some information about these resources. Additional details about these parts are available in README file.

Downloads: 0 This Week

Last Update: 2016-01-19
See Project
11

PADIC

A multilingual Parallel Arabic DIalectal Corpus

PADIC (Parallel Arabic DIalectal Corpus) is a multi-dialectal corpus built in the framework of the National Research Project "TORJMAN", led by Scientific and Technical Research Center for the Development of Arabic Language and funded by the Algerian Ministry of Higher Education and Scientific Research. PADIC is composed of 6 dialects: two Algerian dialects (Algiers and Annaba cities), Palestinian, Syrian, Tunisian, Moroccan) and MSA. Mourad Abbas Computational Linguistics Department...

Downloads: 2 This Week

Last Update: 2017-05-26
See Project
12

Osman Arabic Text Readability

Open Source tool for Arabic text readability

We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. The open source Java tool allows users to calculate readability for Arabic text (with and without diacritics). The tool provides methods to split the text into words and sentence, count syllables, Faseeh letters, hard and complex words in addition to adding diacritics (vocalise text). This makes the tool useful for researchers and educators working with Arabic text...

Downloads: 0 This Week

Last Update: 2016-11-17
See Project
13

Cross-Language Computational Linguistics

cross-languages resources

AFEWC corpus is a multilingual comparable text articles in Arabic, French, and English languages. Each triple article is related to the same topic (aligned at article level). AFEWC corpus is collected from Wikipedia. The corpus is available for free for research purposes only. It is composed of 40K aligned articles, 91.3M English words, 57.8M French words, 22M Arabic words, 2.8M English unique words, 1.9M French unique words, and 1.5M Arabic unique words. Wikipedia text is available...

Downloads: 0 This Week

Last Update: 2015-09-11
See Project
14

Arabic Corpus

The Arabic corpus has been developed as part of a research project named "A New Approach of Semi-Indexing of Text Documents". This corpus consists of more than 460 Arab books. Arabic corpus can be used for the development of language engineering applications, information retrieval and information extraction. The total corpus size is 137 MB It contains 23,264,785 words and more than 128,584,458 letters.

1 Review

Downloads: 0 This Week

Last Update: 2014-02-19
See Project
15

Arabic Named Entity Gazetteer

Arabic Named Entity Gazetteer

Arabic Named Entity Gazetteer (WIKIFANE_Gazet) is an Arabic "fine-grained" gazetteer that has been automatically compiled from the Arabic Wikipedia. This gazetteer is compiled using an xml tags such as <class_name>Arabic Named Entity</class_name>. Each line has an Arabic entity (UTF-8 encoding). This release of WikiFANE_Gazet consists of 68343 entities categorised into 50 classes. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Automatically Developing...

Downloads: 2 This Week

Last Update: 2014-08-24
See Project
16

Arabic Wikipedia into Named Entity

“Arabic Wikipedia into Named Entity Taxonomy” is a dataset consists of 4000 of Arabic Wikipedia articles that classified into coarse-grained NE taxonomy. This dataset can be used in document classification tasks in relation to NER. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Mapping Arabic Wikipedia into the Named Entities Taxonomy", In Proceedings of COLING 2012: Posters, p43-52, IIT, Mumbai, India, December 8-15. 2012. Author URL: http...

Downloads: 0 This Week

Last Update: 2014-08-24
See Project
17

Fine-grained Arabic Named Entity Corpora

Fine-grained Arabic Named Entity Corpora

The gold-standard and automatically-developed fine-grained Arabic named entity corpora are resources created by annotating Named Entities into 50 fine-grained classes. The annotation uses two-levels taxonomy in which an entity has been annotated into coarse- and fine-grained classes. A) Manually gold-standard: 1) WikiFANE_Gold: Gold standard Wikipedia-based Fine-grained Arabic Named Entity Corpus, ~500K tokens and 2) NewsFANE_Gold: Gold standard Newswire-based Fine-grained Arabic...

Downloads: 1 This Week

Last Update: 2014-06-12
See Project
18

InAra Plagiarism Detection Corpus

A corpus for the Arabic Intrinsic Plagiarism Detection evaluation

ARAbic INtrinsic plagiarism detection corpus (InAra Corpus 2013) InAra corpus it the first corpus for the evaluation of Arabic Intrinsic plagiarism detection. The Intrinsic Plagiarism Detection consists in uncovering the plagiarized passages on the basis of the writing style inconsistency in a given suspicious document. As opposed to the external approach, the intrinsic approach does not necessitate any comparison of the suspicious document against the potential sources of plagiarism...

Downloads: 0 This Week

Last Update: 2014-01-23
See Project
19

EASC (Essex Arabic Summaries Corpus)

Arabic natural language resources

The EASC is an Arabic natural language resources. It contains 153 Arabic articles and 765 human-generated extractive summaries of those articles. These summaries were generated using Mechanical Turk (http://www.mturk.com/). Among the major features of EASC are: Names and extensions are formatted to be compatible with current evaluation systems such as ROUGE and AutoSummENG. Available in two encoding formats UTF-8 and ISO-8859-6 (Arabic). The Essex Arabic Summaries Corpus (EASC) uses...

Downloads: 8 This Week

Last Update: 2016-03-18
See Project
20

KALIMAT Multipurpose Arabic Corpus

A corpus that could be of help for researchers working on Arabic NLP

KALIMAT a Multipurpose Arabic Corpus We are pleased to announce the immediate availability of KALIMAT 1.0, KALIMAT is an Arabic natural language resource that consists of: 1) 20,291 Arabic articles collected from the Omani newspaper Alwatan by (Abbas et al. 2011). 2) 20,291 Extractive Single-document system summaries. 3) 2,057 Extractive Multi-document system summaries. 4) 20,291 Named Entity Recognised articles. 5) 20,291 Part of Speech Tagged articles. 6) 20,291...

Downloads: 31 This Week

Last Update: 2015-04-09
See Project
21

Arabic Obsolete Words

A list of obsolete words in the Buckwalter Morphological Analyser

This is a list of obsolete words, or words that are outdated or not in contemporary use, in the Buckwalter Morphological Analyser database. This list is developed according to a threshold of frequency on the web and the Arabic gigaword corpus. The list contain about 8,400 words that fell out of current use with a margin error of 1%. The threshold is defined like this. All the lemmas in Buckwalter queried in three news web sites (al-Jazeera, Arabic BBC and Arabic Wikipedia) and if the lemma...

Downloads: 0 This Week

Last Update: 2012-06-11
See Project
22

Arabic Multiword Expressions

Multiword expression resources for Arabic, totalling 34,658 MWEs

Multiword expression resources for Arabic, totalling 34,658 MWEs. These MWEs are extracted from the Arabic wikipedia,from the Arabic Gigaword corpus (4th Edition), and from the English Princeton WordNet translated into Arabic.

Downloads: 0 This Week

Last Update: 2013-05-29
See Project
23

AADRTE

Automatic Arabic Domain-Relevant Term Extraction

In this research we propose a model for automatic domain-relevant term extraction from Arabic text corpus. The proposed model uses a hybrid approach composed of linguistic and statistical methods to extract terms relevant to specific domains depending on prevalence and tendency term ranking mechanism. This increases precision and recall as a measures of relevancy of extracted terms to a specific domain.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
24

Arabic Broken Plurals

List of Arabic Broken Plurals

This is the List of Arabic Broken Plurals automatically extracted by Mohammed Attia from a large contemporary corpus, provided with morphological patterns for both the singular forms and the plural forms. It contains 2562 broken plural forms.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
25

Word Count of Modern Standard Arabic

A word count of Modern Standard Arabic from a 1 billion word corpus, sorted according to frequency counts

1 Review

Downloads: 1 This Week

Last Update: 2015-11-12
See Project

Previous
You're on page 1
2
Next

Search Results for "arabic corpus"

Showing 26 open source projects for "arabic corpus"

Corpus of Early Arabic Poetry (CEAP)

Queries-for-Arabic-OSAC-Corpus

Linguistic Analyzer

KSUCCA Corpus

agd-text

Arabic Corpus

Queries for OSAC (Arabic) Corpus

Tashkeela: Arabic diacritization corpus

Arabic business corpora

Classical Arabic Corpus

PADIC

Osman Arabic Text Readability

Cross-Language Computational Linguistics

Arabic Corpus

Arabic Named Entity Gazetteer

Arabic Wikipedia into Named Entity

Fine-grained Arabic Named Entity Corpora

InAra Plagiarism Detection Corpus

EASC (Essex Arabic Summaries Corpus)

KALIMAT Multipurpose Arabic Corpus

Arabic Obsolete Words

Arabic Multiword Expressions

AADRTE

Arabic Broken Plurals

Word Count of Modern Standard Arabic

Search Results for "arabic corpus"

Showing 26 open source projects for "arabic corpus"

Corpus of Early Arabic Poetry (CEAP)

Queries-for-Arabic-OSAC-Corpus

Linguistic Analyzer

KSUCCA Corpus

agd-text

Arabic Corpus

Queries for OSAC (Arabic) Corpus

Tashkeela: Arabic diacritization corpus

Arabic business corpora

Classical Arabic Corpus

PADIC

Osman Arabic Text Readability

Cross-Language Computational Linguistics

Arabic Corpus

Arabic Named Entity Gazetteer

Arabic Wikipedia into Named Entity

Fine-grained Arabic Named Entity Corpora

InAra Plagiarism Detection Corpus

EASC (Essex Arabic Summaries Corpus)

KALIMAT Multipurpose Arabic Corpus

Arabic Obsolete Words

Arabic Multiword Expressions

AADRTE

Arabic Broken Plurals

Word Count of Modern Standard Arabic

Related Searches

Related Categories