Join/Login
Open Source Software
Business Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Open Source Software

Business Software

SourceForge Podcast

Articles
Case Studies
Learn
Blog

Menu

Help
Create
Join
Login

Home
Browse Open Source
Search Results

Search Results for "corpus bbc arabic"

x

Sort By:

Relevance

Clear All Filters

OS

Mac 23
Linux 23
Windows 23
More...
BSD 1
ChromeOS 1
Mobile Operating Systems 1

Category

Artificial Intelligence 3
Scientific/Engineering 3
Communications 1
Database 1

License

Creative Commons Attribution License 5
OSI-Approved Open Source 2
Other License 1

Translations

Arabic 2

Programming Language

Java 1

Status

Beta 1
Production/Stable 1

Showing 23 open source projects for "corpus bbc arabic"

View related business solutions

Mac Clear Filters & Widen Search

Business Continuity Solutions | ConnectWise BCDR
Build a foundation for data security and disaster recovery to fit your clients’ needs no matter the budget.

Whether natural disaster, cyberattack, or plain-old human error, data can disappear in the blink of an eye. ConnectWise BCDR (formerly Recover) delivers reliable and secure backup and disaster recovery backed by powerful automation and a 24/7 NOC to get your clients back to work in minutes, not days.

Learn More
EBizCharge Payment Platform for Accounts Receivable
Getting paid has never been easier.

Don’t let unpaid invoices limit your business’s growth. EBizCharge plugs directly into the tools your business already uses to speed up payment collection.

Learn More
1

Queries-for-Arabic-OSAC-Corpus

43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/

Downloads: 0 This Week

Last Update: 2021-12-03
See Project
2

Corpus of Early Arabic Poetry (CEAP)

Downloads: 0 This Week

Last Update: 2022-08-02
See Project
3

Linguistic Analyzer

The Linguistic Analyzer is a tool for corpus analysis and comparison

The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.

Downloads: 0 This Week

Last Update: 2022-04-16
See Project
4

agd-text

In this corpus: 10 essays containing 752 sentences (with a total of 4,160 words). The essays were selected from different collections of partially or totally diacritic Arabic texts, all of which are available in the Tashkeela corpus. Texts in this corpus have been used in the evaluation of AGD checker. There are two types of texts in this corpus: 1- Texts without errors to evaluate AGD in terms of detecting and correcting errors that we do not know about before the checking process 2...

Downloads: 0 This Week

Last Update: 2021-02-01
See Project
Find out just how much your login box can do for your customer | Auth0
With over 53 social login options, you can fast-track the signup and login experience for users.

From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.

Sign up
5

KSUCCA Corpus

A 50 million tokens corpus of Classical Arabic.

King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes...

Downloads: 4 This Week

Last Update: 2020-02-19
See Project
6

Queries for OSAC (Arabic) Corpus

43 Queries for Arabic Information Retrieval Collection

43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/

Downloads: 0 This Week

Last Update: 2019-01-07
See Project
7

Tashkeela: Arabic diacritization corpus

Tashkeela: Arabic discritization Corpus (Vocalized texts)

Tashkeela: Arabic discritization Corpus, Resource, Arabic vocalized texts: نصوص عربية مشكولة =========== Contains Arabic text vocalized . Text -format; 75.6 millions words Please cite this resource as: T. Zerrouki, A. Balla, Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems, Data in Brief (2017), http://dx.doi.org/10.1016/j.dib.2017.01.011 Data in Brief ∎ ( ∎∎∎∎ ) ∎∎∎ – ∎∎∎

1 Review

Downloads: 6 This Week

Last Update: 2018-02-15
See Project
8

Arabic business corpora

Arabic business and management corpus

This corpora is made up of 3 sub corpora as follows: 1) Management Corpus: 400 articles by Chairmans and CEOs of Arabic companies in the Middle East. 2) Economics News: 400 news articles from different Arabic online newspapers. 3) Stock market news, 400 articles collected from investing.com. The main corpora contains 1200 articles. The articles have been tagged using Stanford Arabic Part of Speech Tagger. Both plain text and tagged corpora are available to download, check the Files...

Downloads: 3 This Week

Last Update: 2016-11-01
See Project
9

Osman Arabic Text Readability

Open Source tool for Arabic text readability

We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. The open source Java tool allows users to calculate readability for Arabic text (with and without diacritics). The tool provides methods to split the text into words and sentence, count syllables, Faseeh letters, hard and complex words in addition to adding diacritics (vocalise text). This makes the tool useful for researchers and educators working with Arabic text...

Downloads: 0 This Week

Last Update: 2016-11-17
See Project
ConnectWise Cybersecurity Management for MSPs
Software and support solutions to protect your clients’ critical business assets

ConnectWise SIEM (formerly Perch) offers threat detection and response backed by an in-house Security Operations Center (SOC). Defend against business email compromise, account takeovers, and see beyond your network traffic. Our team of threat analysts does all the tedium for you, eliminating the noise and sending only identified and verified treats to action on. Built with multi-tenancy, ConnectWise SIEM helps you keep clients safe with the best threat intel on the market.

Learn More
10

Cross-Language Computational Linguistics

cross-languages resources

AFEWC corpus is a multilingual comparable text articles in Arabic, French, and English languages. Each triple article is related to the same topic (aligned at article level). AFEWC corpus is collected from Wikipedia. The corpus is available for free for research purposes only. It is composed of 40K aligned articles, 91.3M English words, 57.8M French words, 22M Arabic words, 2.8M English unique words, 1.9M French unique words, and 1.5M Arabic unique words. Wikipedia text is available...

Downloads: 0 This Week

Last Update: 2015-09-11
See Project
11

Arabic Named Entity Gazetteer

Arabic Named Entity Gazetteer

Arabic Named Entity Gazetteer (WIKIFANE_Gazet) is an Arabic "fine-grained" gazetteer that has been automatically compiled from the Arabic Wikipedia. This gazetteer is compiled using an xml tags such as <class_name>Arabic Named Entity</class_name>. Each line has an Arabic entity (UTF-8 encoding). This release of WikiFANE_Gazet consists of 68343 entities categorised into 50 classes. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Automatically Developing...

Downloads: 1 This Week

Last Update: 2014-08-24
See Project
12

Arabic Corpus

The Arabic corpus has been developed as part of a research project named "A New Approach of Semi-Indexing of Text Documents". This corpus consists of more than 460 Arab books. Arabic corpus can be used for the development of language engineering applications, information retrieval and information extraction. The total corpus size is 137 MB It contains 23,264,785 words and more than 128,584,458 letters.

1 Review

Downloads: 0 This Week

Last Update: 2014-02-19
See Project
13

Arabic Wikipedia into Named Entity

“Arabic Wikipedia into Named Entity Taxonomy” is a dataset consists of 4000 of Arabic Wikipedia articles that classified into coarse-grained NE taxonomy. This dataset can be used in document classification tasks in relation to NER. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Mapping Arabic Wikipedia into the Named Entities Taxonomy", In Proceedings of COLING 2012: Posters, p43-52, IIT, Mumbai, India, December 8-15. 2012. Author URL: http...

Downloads: 0 This Week

Last Update: 2014-08-24
See Project
14

Fine-grained Arabic Named Entity Corpora

Fine-grained Arabic Named Entity Corpora

The gold-standard and automatically-developed fine-grained Arabic named entity corpora are resources created by annotating Named Entities into 50 fine-grained classes. The annotation uses two-levels taxonomy in which an entity has been annotated into coarse- and fine-grained classes. A) Manually gold-standard: 1) WikiFANE_Gold: Gold standard Wikipedia-based Fine-grained Arabic Named Entity Corpus, ~500K tokens and 2) NewsFANE_Gold: Gold standard Newswire-based Fine-grained Arabic...

Downloads: 2 This Week

Last Update: 2014-06-12
See Project
15

InAra Plagiarism Detection Corpus

A corpus for the Arabic Intrinsic Plagiarism Detection evaluation

ARAbic INtrinsic plagiarism detection corpus (InAra Corpus 2013) InAra corpus it the first corpus for the evaluation of Arabic Intrinsic plagiarism detection. The Intrinsic Plagiarism Detection consists in uncovering the plagiarized passages on the basis of the writing style inconsistency in a given suspicious document. As opposed to the external approach, the intrinsic approach does not necessitate any comparison of the suspicious document against the potential sources of plagiarism...

Downloads: 1 This Week

Last Update: 2014-01-23
See Project
16

KALIMAT Multipurpose Arabic Corpus

A corpus that could be of help for researchers working on Arabic NLP

KALIMAT a Multipurpose Arabic Corpus We are pleased to announce the immediate availability of KALIMAT 1.0, KALIMAT is an Arabic natural language resource that consists of: 1) 20,291 Arabic articles collected from the Omani newspaper Alwatan by (Abbas et al. 2011). 2) 20,291 Extractive Single-document system summaries. 3) 2,057 Extractive Multi-document system summaries. 4) 20,291 Named Entity Recognised articles. 5) 20,291 Part of Speech Tagged articles. 6) 20,291...

Downloads: 102 This Week

Last Update: 2015-04-09
See Project
17

EASC (Essex Arabic Summaries Corpus)

Arabic natural language resources

The EASC is an Arabic natural language resources. It contains 153 Arabic articles and 765 human-generated extractive summaries of those articles. These summaries were generated using Mechanical Turk (http://www.mturk.com/). Among the major features of EASC are: Names and extensions are formatted to be compatible with current evaluation systems such as ROUGE and AutoSummENG. Available in two encoding formats UTF-8 and ISO-8859-6 (Arabic). The Essex Arabic Summaries Corpus (EASC) uses...

Downloads: 5 This Week

Last Update: 2016-03-18
See Project
18

Arabic Obsolete Words

A list of obsolete words in the Buckwalter Morphological Analyser

This is a list of obsolete words, or words that are outdated or not in contemporary use, in the Buckwalter Morphological Analyser database. This list is developed according to a threshold of frequency on the web and the Arabic gigaword corpus. The list contain about 8,400 words that fell out of current use with a margin error of 1%. The threshold is defined like this. All the lemmas in Buckwalter queried in three news web sites (al-Jazeera, Arabic BBC and Arabic Wikipedia) and if the lemma...

Downloads: 1 This Week

Last Update: 2012-06-11
See Project
19

Arabic Multiword Expressions

Multiword expression resources for Arabic, totalling 34,658 MWEs

Multiword expression resources for Arabic, totalling 34,658 MWEs. These MWEs are extracted from the Arabic wikipedia,from the Arabic Gigaword corpus (4th Edition), and from the English Princeton WordNet translated into Arabic.

Downloads: 0 This Week

Last Update: 2013-05-29
See Project
20

AADRTE

Automatic Arabic Domain-Relevant Term Extraction

In this research we propose a model for automatic domain-relevant term extraction from Arabic text corpus. The proposed model uses a hybrid approach composed of linguistic and statistical methods to extract terms relevant to specific domains depending on prevalence and tendency term ranking mechanism. This increases precision and recall as a measures of relevancy of extracted terms to a specific domain.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
21

Arabic Broken Plurals

List of Arabic Broken Plurals

This is the List of Arabic Broken Plurals automatically extracted by Mohammed Attia from a large contemporary corpus, provided with morphological patterns for both the singular forms and the plural forms. It contains 2562 broken plural forms.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
22

Word Count of Modern Standard Arabic

A word count of Modern Standard Arabic from a 1 billion word corpus, sorted according to frequency counts

1 Review

Downloads: 0 This Week

Last Update: 2015-11-12
See Project
23

arabicwordcorpus

An Arabic word Corpus, which contains a huge list of words, starting by 1.5 million words, usefull for naturel language processing.

Downloads: 0 This Week

Last Update: 2013-05-30
See Project

Previous
You're on page 1
Next

Related Searches

corpus bbc arabic

arabic syllables counter

arabic named entity recognition

arabic plagiarism detection

arabic corpus csv

word frequency count

arabic pos tagger

arabic morphological

Related Categories

Artificial Intelligence

Scientific/Engineering

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
225 Broadway Suite 1600
San Diego, CA 92101
+1 (858) 454-5900

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2024 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: