Showing 24 open source projects for "word frequency"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 1
    TEXminer

    TEXminer

    Text Mining Classification for Texts in ASCII, Unicode and PDF Format.

    TEXminer uses generic Text Mining Methods to analyze Unicode Files as plain Text or PDF. The Text Database can be saved in XML where the orginal Text, the Sentence and Word Lists and additional Parameters (e.g. Abbreviations) are stored. TEXminer allows Language Detection by Letter Frequency Analysis, finding important Words by Cooccurrence Analysis, Determination of Central Expressions, Thematic Text Classification (also Semantic Groups) Fingerprint Comparison and Word Frequency. Because TEXminer is not disigned to have a Reference Corpus, Thematic Model Statistics uses Language Models (lexicons) to have Background Knowledge about certain Languages (English, German, French, Spanish, Italian, Russian), which are derived from Decaleon Project. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Kindle Mate(KMate)

    Kindle Mate(KMate)

    Kindle clippings and Kindle Vocabulary Builder manager

    KMate is the ultimate reading companion for Kindle users — and the all-new, cross-platform successor to Kindle Mate, the classic Kindle notes manager trusted by readers worldwide for over a decade. It is the only Kindle assistant that unifies cross-device import, cloud sync, vocabulary & dictionary management, flexible export, reading analytics, and AI-powered definitions — all in one app. ## KMate 3 for Windows latest (Store...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 3
    Onda Sfasata

    Onda Sfasata

    An authentic Italian learning app.

    ...GitHub repository: https://github.com/Northstrix/onda-sfasata Check it out at: https://onda-sfasata.netlify.app/ This app is fully localized into English, Hebrew, and two dialects of German — Hochdeutsch and a mixture of Zurich and Basel dialects (approximately 64%–36%), labeled as “Schwiizerdütsch” I picked the words for this app not based on predefined categories, usage frequency, or the fluency level to which the word might correspond, but on which words could be cleanly cut from the audio tracks. As a result, the word set turned out to be a bit odd, yet unique. Every single sound used in the app, except for success.wav, error.wav, and completed.wav, was extracted from public domain recordings. The success and error sounds are covered by Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/), the completed sound is available under Creative Commons 0 License (http://creativecommons.org/publicdomain/zero/1.0/)
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4

    pyLogos

    Qualitative content analysis software.

    ...Documents (imported from txt and docx files) are stored in a database, and may have marked text segments associated with codes. It is possible to retrieve these segments in various ways, generate word clouds, tabulate frequency of codes and words, among other outputs. pyLogos é um programa de apoio à análise de conteúdo de textos. Documentos (importados de arquivos txt e docx) são armazenados numa base de dados, podendo ter segmentos de textos marcados a associados a códigos. É possível recuperar esses segmentos de várias formas, gerar nuvens de palavras, tabular frequência de códigos e palavras, entre outras saídas.
    Downloads: 3 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Leader badge
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Texthero

    Texthero

    Text preprocessing, representation and visualization from zero to hero

    Texthero is a python package to work with text data efficiently. It empowers NLP developers with a tool to quickly understand any text-based dataset and it provides a solid pipeline to clean and represent text data, from zero to hero.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Word frequency and diversity (distribution) across hundreds of corpora. You'll see both the lemma and the various forms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Meaning Explorer

    A tool for analyzing the words of the Quran

    The main purpose of this tool is to help users in extracting syntagmatic relations between words, lemmas and roots available in the Quran; these relations include identifying significant collocates and words’ co-occurrences. In addition, the tool also provides other helpful functionalities that complement the primary purpose, which include a Key Word In Context (KWIC) concordance, in addition to frequency lists of all words, lemmas and roots in the holy Quran. The main intended users of this tool are Arabic Quranic scholars and linguists. The Meaning Explorer applies a new distributional semantic model to extract words’ significant co-occurrences from the Quran. This model is based on the Refined MI association measure applied to all words within a symmetric sliding window of five words surrounding the node word. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    ...It focuses on automating the collection of large sets of images by programmatically parsing page content and iterating through gallery entries. mzitu also includes a simple analysis script that processes downloaded folder names to generate statistics and visualizations. Using text segmentation and frequency analysis, the project can create a word cloud representing common keywords found in the dataset. This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    pydictor

    pydictor

    powerful and useful hacker dictionary builder for a brute-force attack

    ...You can use pydictor to generate a general blast wordlist, a custom wordlist based on Web content, a social engineering wordlist, and so on; You can use the pydictor built-in tool to safe delete, merge, unique, merge and unique, count word frequency to filter the wordlist, besides, you also can specify your wordlist and use '-tool handler' to filter your wordlist. You can generate highly customized and complex wordlists by modifying multiple configuration files, adding your own dictionary, using leet mode, filter by length, char occur times, types of different char, regex, and even add customized encode scripts in /lib/encode/ folder, add your own plugin script in /plugins/ folder, add your own tool script in /tools/ folder.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    TuxWordSmith

    TuxWordSmith

    TuxWordSmith uses XDXF dictionaries to play in 88 languages

    Similar to the classic word game 'Scrabble', but with unicode support for multiple languages and character sets. The game is currently distributed with eighty-eight (88) dictionary resources for playing Language[i]-Language[j] 'Scrabble'. For example, if configured to use the French-English dictionary, then the distribution of available tiles will be computed based on frequency of occurance of each character of Language[i] (French), and for each submission the corresponding definition will be given in Language[j] (English).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    TextTools
    TextTools is a freeware corpus linguistics tool developed in Python to aid in research. This program analyzes user-created corpora and displays information about word (token) frequency, n-grams, clusters, collocations, keyword in context (KWIC), and keyness. TextTools is designed to be user-friendly and intuitive and will run natively on Mac OS X.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    word frequency counter

    Word Frequency Counter

    Word Frequency Counter
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    SubString is a set of shell scripts implementing substring reduction and frequency consolidation of word n-grams.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Arabic New Words

    List of new words not included in current dictionaries

    ...It includes 476,349 new lemmatized words, and they are weighted and ordered so that there is a good likelihood that words which are most relevant (lexicographically) will surface to the top and the least relevant words will be pushed down the list. So, for example if you take the first 10,000 words, there is a good chance that you'll find a large number of word fit to include in a dictionary. Please consider that the word list is not filtered by a spell checker, so many words will only be Misspellings. Proper names are not filtered out because high frequency proper names are usually included in morphological analysers to improve coverage, but in dictionaries people might want to exclude them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    A word count of Modern Standard Arabic from a 1 billion word corpus, sorted according to frequency counts
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MatnPardaz
    MatnPardaz calculates how many times a word appears in a Text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Virtual Keyboard

    Virtual Keyboard

    Onscreen keyboard for eye tracking systems

    Onscreen keyboard for eye tracking systems
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    PyWordGen is a random word generator that generates statistics of word parts (frequency and combinations) of a given language and stores that persistantly. That data is then utilized for generating random words with the same characteristics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Perlconc is a Perl-CGI script to search corpora of text files for words/phrases, outputting either a word frequency count or a concordance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Toke is a webmining toolkit for web exploring, indexing and searching for Java. Toke allows to you crawl public or private web sites, in order to create web estatistics, web Pajek graphs, Lucene indexs and word frequency files for data clustering.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    libtabe is a library which provides useful Chinese functions/routines that can deal with fundamental elements such as pronunciation(BoPoMoFo), character frequency, word identification, word frequency. It also comes with a large free word database.
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB