Showing 23 open source projects for "corpus bbc arabic"

View related business solutions
  • Business Continuity Solutions | ConnectWise BCDR Icon
    Business Continuity Solutions | ConnectWise BCDR

    Build a foundation for data security and disaster recovery to fit your clients’ needs no matter the budget.

    Whether natural disaster, cyberattack, or plain-old human error, data can disappear in the blink of an eye. ConnectWise BCDR (formerly Recover) delivers reliable and secure backup and disaster recovery backed by powerful automation and a 24/7 NOC to get your clients back to work in minutes, not days.
  • EBizCharge Payment Platform for Accounts Receivable Icon
    EBizCharge Payment Platform for Accounts Receivable

    Getting paid has never been easier.

    Don’t let unpaid invoices limit your business’s growth. EBizCharge plugs directly into the tools your business already uses to speed up payment collection.
  • 1
    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    In this corpus: 10 essays containing 752 sentences (with a total of 4,160 words). The essays were selected from different collections of partially or totally diacritic Arabic texts, all of which are available in the Tashkeela corpus. Texts in this corpus have been used in the evaluation of AGD checker. There are two types of texts in this corpus: 1- Texts without errors to evaluate AGD in terms of detecting and correcting errors that we do not know about before the checking process 2...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Find out just how much your login box can do for your customer | Auth0 Icon
    Find out just how much your login box can do for your customer | Auth0

    With over 53 social login options, you can fast-track the signup and login experience for users.

    From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.
  • 5

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6

    Queries for OSAC (Arabic) Corpus

    43 Queries for Arabic Information Retrieval Collection

    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Tashkeela: Arabic diacritization corpus

    Tashkeela: Arabic diacritization corpus

    Tashkeela: Arabic discritization Corpus (Vocalized texts)

    Tashkeela: Arabic discritization Corpus, Resource, Arabic vocalized texts: نصوص عربية مشكولة =========== Contains Arabic text vocalized . Text -format; 75.6 millions words Please cite this resource as: T. Zerrouki, A. Balla, Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems, Data in Brief (2017), http://dx.doi.org/10.1016/j.dib.2017.01.011 Data in Brief ∎ ( ∎∎∎∎ ) ∎∎∎ – ∎∎∎
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8

    Arabic business corpora

    Arabic business and management corpus

    This corpora is made up of 3 sub corpora as follows: 1) Management Corpus: 400 articles by Chairmans and CEOs of Arabic companies in the Middle East. 2) Economics News: 400 news articles from different Arabic online newspapers. 3) Stock market news, 400 articles collected from investing.com. The main corpora contains 1200 articles. The articles have been tagged using Stanford Arabic Part of Speech Tagger. Both plain text and tagged corpora are available to download, check the Files...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Osman Arabic Text Readability

    Osman Arabic Text Readability

    Open Source tool for Arabic text readability

    We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. The open source Java tool allows users to calculate readability for Arabic text (with and without diacritics). The tool provides methods to split the text into words and sentence, count syllables, Faseeh letters, hard and complex words in addition to adding diacritics (vocalise text). This makes the tool useful for researchers and educators working with Arabic text...
    Downloads: 0 This Week
    Last Update:
    See Project
  • ConnectWise Cybersecurity Management for MSPs Icon
    ConnectWise Cybersecurity Management for MSPs

    Software and support solutions to protect your clients’ critical business assets

    ConnectWise SIEM (formerly Perch) offers threat detection and response backed by an in-house Security Operations Center (SOC). Defend against business email compromise, account takeovers, and see beyond your network traffic. Our team of threat analysts does all the tedium for you, eliminating the noise and sending only identified and verified treats to action on. Built with multi-tenancy, ConnectWise SIEM helps you keep clients safe with the best threat intel on the market.
  • 10
    AFEWC corpus is a multilingual comparable text articles in Arabic, French, and English languages. Each triple article is related to the same topic (aligned at article level). AFEWC corpus is collected from Wikipedia. The corpus is available for free for research purposes only. It is composed of 40K aligned articles, 91.3M English words, 57.8M French words, 22M Arabic words, 2.8M English unique words, 1.9M French unique words, and 1.5M Arabic unique words. Wikipedia text is available...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    Arabic Named Entity Gazetteer

    Arabic Named Entity Gazetteer

    Arabic Named Entity Gazetteer (WIKIFANE_Gazet) is an Arabic "fine-grained" gazetteer that has been automatically compiled from the Arabic Wikipedia. This gazetteer is compiled using an xml tags such as <class_name>Arabic Named Entity</class_name>. Each line has an Arabic entity (UTF-8 encoding). This release of WikiFANE_Gazet consists of 68343 entities categorised into 50 classes. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Automatically Developing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    The Arabic corpus has been developed as part of a research project named "A New Approach of Semi-Indexing of Text Documents". This corpus consists of more than 460 Arab books. Arabic corpus can be used for the development of language engineering applications, information retrieval and information extraction. The total corpus size is 137 MB It contains 23,264,785 words and more than 128,584,458 letters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Arabic Wikipedia into Named Entity Taxonomy” is a dataset consists of 4000 of Arabic Wikipedia articles that classified into coarse-grained NE taxonomy. This dataset can be used in document classification tasks in relation to NER. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Mapping Arabic Wikipedia into the Named Entities Taxonomy", In Proceedings of COLING 2012: Posters, p43-52, IIT, Mumbai, India, December 8-15. 2012. Author URL: http...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    Fine-grained Arabic Named Entity Corpora

    Fine-grained Arabic Named Entity Corpora

    The gold-standard and automatically-developed fine-grained Arabic named entity corpora are resources created by annotating Named Entities into 50 fine-grained classes. The annotation uses two-levels taxonomy in which an entity has been annotated into coarse- and fine-grained classes. A) Manually gold-standard: 1) WikiFANE_Gold: Gold standard Wikipedia-based Fine-grained Arabic Named Entity Corpus, ~500K tokens and 2) NewsFANE_Gold: Gold standard Newswire-based Fine-grained Arabic...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15

    InAra Plagiarism Detection Corpus

    A corpus for the Arabic Intrinsic Plagiarism Detection evaluation

    ARAbic INtrinsic plagiarism detection corpus (InAra Corpus 2013) InAra corpus it the first corpus for the evaluation of Arabic Intrinsic plagiarism detection. The Intrinsic Plagiarism Detection consists in uncovering the plagiarized passages on the basis of the writing style inconsistency in a given suspicious document. As opposed to the external approach, the intrinsic approach does not necessitate any comparison of the suspicious document against the potential sources of plagiarism...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16

    KALIMAT Multipurpose Arabic Corpus

    A corpus that could be of help for researchers working on Arabic NLP

    KALIMAT a Multipurpose Arabic Corpus We are pleased to announce the immediate availability of KALIMAT 1.0, KALIMAT is an Arabic natural language resource that consists of: 1) 20,291 Arabic articles collected from the Omani newspaper Alwatan by (Abbas et al. 2011). 2) 20,291 Extractive Single-document system summaries. 3) 2,057 Extractive Multi-document system summaries. 4) 20,291 Named Entity Recognised articles. 5) 20,291 Part of Speech Tagged articles. 6) 20,291...
    Leader badge
    Downloads: 102 This Week
    Last Update:
    See Project
  • 17

    EASC (Essex Arabic Summaries Corpus)

    Arabic natural language resources

    The EASC is an Arabic natural language resources. It contains 153 Arabic articles and 765 human-generated extractive summaries of those articles. These summaries were generated using Mechanical Turk (http://www.mturk.com/). Among the major features of EASC are: Names and extensions are formatted to be compatible with current evaluation systems such as ROUGE and AutoSummENG. Available in two encoding formats UTF-8 and ISO-8859-6 (Arabic). The Essex Arabic Summaries Corpus (EASC) uses...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18

    Arabic Obsolete Words

    A list of obsolete words in the Buckwalter Morphological Analyser

    This is a list of obsolete words, or words that are outdated or not in contemporary use, in the Buckwalter Morphological Analyser database. This list is developed according to a threshold of frequency on the web and the Arabic gigaword corpus. The list contain about 8,400 words that fell out of current use with a margin error of 1%. The threshold is defined like this. All the lemmas in Buckwalter queried in three news web sites (al-Jazeera, Arabic BBC and Arabic Wikipedia) and if the lemma...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19

    Arabic Multiword Expressions

    Multiword expression resources for Arabic, totalling 34,658 MWEs

    Multiword expression resources for Arabic, totalling 34,658 MWEs. These MWEs are extracted from the Arabic wikipedia,from the Arabic Gigaword corpus (4th Edition), and from the English Princeton WordNet translated into Arabic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    AADRTE

    Automatic Arabic Domain-Relevant Term Extraction

    In this research we propose a model for automatic domain-relevant term extraction from Arabic text corpus. The proposed model uses a hybrid approach composed of linguistic and statistical methods to extract terms relevant to specific domains depending on prevalence and tendency term ranking mechanism. This increases precision and recall as a measures of relevancy of extracted terms to a specific domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    Arabic Broken Plurals

    List of Arabic Broken Plurals

    This is the List of Arabic Broken Plurals automatically extracted by Mohammed Attia from a large contemporary corpus, provided with morphological patterns for both the singular forms and the plural forms. It contains 2562 broken plural forms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A word count of Modern Standard Arabic from a 1 billion word corpus, sorted according to frequency counts
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    An Arabic word Corpus, which contains a huge list of words, starting by 1.5 million words, usefull for naturel language processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next