Linguistics Software for Linux

View 10 business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Free Dictionaries
    Free translating dictionaries. Source format: TEI-P5 XML. Delivery formats: DICT, Stardict, etc. The dictionaries may include information on the pronunciation, etymology and such, in a platform-independent format. Access: web/plugins/standalone.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 2

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary. More useful references to check: ------------------------------------------- https://sites.google.com/site/mouradabbas9/corpora
    Leader badge
    Downloads: 35 This Week
    Last Update:
    See Project
  • 3
    Fanyi

    Fanyi

    A 🇨🇳 and 🇺🇸 translate tool in your command line

    Fanyi is a tool for translating words between the Chinese and English languages, right in your command line. It’s a good supportive tool for learning and reading the Chinese language from English, or the other way around. All translation data is fetched from iciba.com and fanyi.youdao.com, and with each translation comprehensive and related samples are given for better understanding and proper usage. There are translations for words as well as sentences, and in Mac/Linux bash, words can even be pronounced by the ‘say’ command.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Google Translate PHP

    Google Translate PHP

    Free Google Translate API PHP Package

    A simple and effective PHP library for translating text using Google Translate without needing an API key. It allows developers to integrate real-time translation features into their applications with minimal setup and supports multiple languages, leveraging Google Translate’s unofficial endpoint.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    UnsupervisedMT

    UnsupervisedMT

    Phrase-Based & Neural Unsupervised Machine Translation

    Unsupervised Machine Translation is a research repository that implements both phrase-based SMT and neural MT approaches for translation without parallel corpora. The neural component supports multiple architectures—seq2seq, biLSTM with attention, and Transformer—and allows extensive parameter sharing across languages to improve data efficiency. Training relies on denoising auto-encoding and back-translation, with on-the-fly, multithreaded generation of synthetic parallel data to continually refresh supervision signals. The project also provides scripts to fetch and preprocess monolingual data, learn BPE codes, and train cross-lingual embeddings that bootstrap unsupervised alignment between languages. Beyond the core EMNLP 2018 setup, the codebase exposes additional, optional capabilities such as multi-language training, language model pretraining with shared parameters, and adversarial training.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Apertium: Machine Translation Toolbox

    Apertium: Machine Translation Toolbox

    The free and open-source rule-based machine translation platform

    Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Al-Mintiq: Arabic eSpeak

    Al-Mintiq: Arabic eSpeak

    Arabic voice files for eSpeak system

    Arabic files and voices for eSpeak Text to speech system, المنطيق : ملفات اللغة العربية لبرنامج توليد الكلام من النص إسبيك
    Downloads: 24 This Week
    Last Update:
    See Project
  • 8
    Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    SPPAS

    SPPAS

    SPPAS - the automatic annotation and analyses of speech

    SPPAS is a scientific computer software package written and maintained by Brigitte Bigi of the Laboratoire Parole et Langage, in Aix-en-Provence, France. Available for free, with open source code, there is simply no other package for linguists to simple use in the automatic annotations of speech, the analyses of any kind of annotated data and the conversion of annotated files. SPPAS is able to produce automatically speech annotations from a recorded speech sound and its orthographic transcription. SPPAS is helpful for the analysis of any annotated data: estimate statistical distributions, make requests, manage files, visualize annotations. SPPAS offers a file converter from/to a wide range of formats: xra, TextGrid, eaf, trs... <https://sppas.org>
    Downloads: 23 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 10
    Fresh Memory

    Fresh Memory

    Flashcards application with Spaced Repetition method

    Fresh Memory is an application that helps to learn large amounts of any material with Spaced Repetition method. The most important subject is learning foreign words, but Fresh Memory can be also used to learn anything else. The learning data is stored as flash cards and dictionaries. The flash cards may have several fields, and the user controls what combination of fields to learn. The flashcards can have formatted text and images.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 18 This Week
    Last Update:
    See Project
  • 12

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13

    sgmweka

    Weka wrapper for the SGM toolkit for text classification and modeling.

    Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. Other functions are usable through either Java command-line commands or class inclusion into Java projects.
    Leader badge
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    srt-translator

    srt-translator

    Subtitle translator from one natural language to other.

    Translating subtitles in format SubRip from one natural language to other. It is based on Google Translate without API and therefore without payment. Translator have automatic and manual spell checkers.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    Mani
    Coptic - English and Coptic - Czech dictionary related to Crum's coptic dictionary, written in C++, based on MySql, with Qt GUI. Is developed as part of project Marcion, containing only coptic data without study environment.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Entity recognition and normalization software for biomedical text
    Downloads: 10 This Week
    Last Update:
    See Project
  • 17
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18
    LaBB-CAT

    LaBB-CAT

    A linguistic annotation store

    LABB-CAT is a browser-based linguistics research tool that stores recordings and regular-expression searchable text transcripts of interviews. The search results, entire transcripts, and media, can be viewed or exported in a variety of format
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    oopinyinguide
    OO Pinyin Guide is a Java extension for OpenOffice 3 or higher. It enables the user to add pinyin transliteration over Chinese characters inside a text document. This tool can be useful for people learning or teaching Chinese.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    ICE Nigeria

    ICE Nigeria

    Nigerian component of the International Corpus of English

    This is the Nigerian component of the International Corpus of English, a one million word corpus of written and spoken Nigerian English for linguistic research. It can be used as a stand-alone corpus or in conjunction with other components of the International Corpus of English (such as ICE-GB, ICE-India, etc.) to compare international varieties of English. This is the first release of the complete corpus. The corpus can be downloaded in several parts. The written part can be downloaded as text files, xml files and xml files with parts of speech tagging, both with or without the raw files. For the spoken part the eaf files (ELAN files in xml format) together with the text files can be downloaded separately from the sound files. In addition, we provide the corpus manual as well as metadata (speaker age, gender, ethnic group and profession) and XML specifications.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    Cross-platform application aimed at helping users to learn vocabulary from any foreign language(s). Add/Edit/Delete vocab words (w/ translation, category, sentence, notes, picture). Review (Quiz) vocabulary words.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words. However, it can be used for other research purposes, such as: • Arabic linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research. • Arabic computational linguistics, which includes: lexical, morphological, syntactic, semantic and pragmatic research including their various applications. • Arabic language teaching for both Arabs and non Arabs. • Artificial intelligence. • Natural language processing. • Information retrieval. • Question answering. • Machine translation.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Korean Analyzer Rhino

    Korean Analyzer Rhino

    Parsing Korean words by morpheme and part-of-speech

    RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    AzConvert is an open source program to convert different scripts of Azerbaijani language (Latin, Arabic and Cyrillic) to each other. It's written in Qt.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Smulsa 2001 is a multilanguage two-way translator, transliterator, and dictionary. Its dependencies are gambas2-ide & gambas2-gb-db-sqlite. The application needs its database to run properly. Smulsa 2001 adalah penerjemah, pengalih aksara, dan kamus dua arah multibahasa. Dependensinya gambas2-ide & gambas2-gb-db-sqlite. Aplikasi ini memerlukan basisdatanya untuk berjalan dengan baik.
    Downloads: 4 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB