Showing 23 open source projects for "learning language"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    bilingual_book_maker

    bilingual_book_maker

    Make bilingual epub books Using AI translate

    bilingual_book_maker is an AI-assisted translation tool for creating bilingual and multilingual versions of books and text files. It is designed to process formats such as EPUB, TXT, SRT, and PDF, then generate translated output that helps readers compare the original text with the target language. The project supports multiple AI providers and models, including OpenAI-compatible models and other translation backends through LiteLLM-style integrations. It is especially useful for public domain books, language learning, subtitle translation, and personal reading workflows. Users can run it from Python scripts or install it as a command-line package for repeated translation tasks. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Fanyi

    Fanyi

    A 🇨🇳 and 🇺🇸 translate tool in your command line

    Fanyi is a tool for translating words between the Chinese and English languages, right in your command line. It’s a good supportive tool for learning and reading the Chinese language from English, or the other way around. All translation data is fetched from iciba.com and fanyi.youdao.com, and with each translation comprehensive and related samples are given for better understanding and proper usage. There are translations for words as well as sentences, and in Mac/Linux bash, words can even be pronounced by the ‘say’ command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Unitag is a language-independent Unicode-based part-of-speech tagging system. Written entirely in ANSI-compatible C, it should (in theory) compile on any OS, but has been tested on 32-bit Windows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MTBook

    MTBook

    Machine Translation: Foundations and Models

    This is a tutorial, the purpose is to introduce the basic knowledge and modeling methods of machine translation systematically, and on this basis, discuss some cutting-edge technologies of machine translation (formerly known as "Machine Translation: Statistical Modeling and Deep Learning") method"). Its content is compiled into a book, which can be used for the study of senior undergraduates and graduate students in computer and artificial intelligence related majors, and can also be used as reference material for researchers related to natural language processing, especially machine translation. This book is written in tex, and all source codes are open. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...
    Leader badge
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    Speakable Programming for Every Language

    Speakable Programming for Every Language

    Your language to speak with all.

    This project has the language data for spel, the main new codebase is at: https://gitlab.com/liberit/pyac A computer programming language using human language syntax for human-to-human and human-to-computer communication with high precision, supporting many languages. Currently has alpha prototype support for analytic versions of the UN languages English, Mandarin Chinese, Spanish, Arabic, Russian...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    TEES

    Turku Event Extraction System

    Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    RDRPOSTagger

    A Rule-based Part-of-Speech and Morphological Tagging Toolkit

    RDRPOSTagger is a robust, easy-to-use and language-independent rule-based toolkit for Part-of-Speech (POS) and morphological tagging. RDRPOSTagger obtains fast performance in both learning and tagging process. RDRPOSTagger also achieves a very competitive accuracy in comparison to the state-of-the-art results. RDRPOSTagger now supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    PADIC

    A multilingual Parallel Arabic DIalectal Corpus

    PADIC (Parallel Arabic DIalectal Corpus) is a multi-dialectal corpus built in the framework of the National Research Project "TORJMAN", led by Scientific and Technical Research Center for the Development of Arabic Language and funded by the Algerian Ministry of Higher Education and Scientific Research. PADIC is composed of 6 dialects: two Algerian dialects (Algiers and Annaba cities), Palestinian, Syrian, Tunisian, Moroccan) and MSA. Mourad Abbas Computational Linguistics Department,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    Phrasal

    Phrasal

    Statistical phrase-based machine translation system

    Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java. At its core, it provides much the same functionality as the core of Moses. Distinctive features include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models. Developed by The Natural Language Processing Group...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Part-of-speech tagging is the task of assigning symbols from a particular set to words in a natural language text. ACOPOST implements and extends well-known machine learning techniques and provides a uniform environment for testing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DCTFinder

    DCTFinder

    Extract title and creation time from web page.

    ...DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation time recognition. DCTFinder is released under CeCILL free software license agreement. The system is described in the following paper (see 'Files' section): Xavier Tannier. "Extracting News Web Page Creation Time with DCTFinder". Proceedings of the 9th Language Resources and Evaluation Conference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    ArabicDiacritizer

    ArabicDiacritizer

    An automatic restoration of Arabic diacritic marks

    This is a software of Arabic diacritical marks restoration. It is based mainly on deep architectures using deep neural network. The algorithm generates diacritized text with determined end case. The algorithm is described in detail in: Ilyes Rebai, and Yassine BenAyed 'Text-to-speech synthesis system with Arabic diacritic recognition system', Computer Speech & Language, 2015. We appreciate it very much if you can cite our related work. ************** Installation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    Bermuda Text-to-Speech

    This project includes basic NLP and DSP techniques for Text-to-Speech

    See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CRFSharp

    CRFSharp

    CRFSharp is a .NET(C#) implementation of Conditional Random Field

    CRFSharp(aka CRF#) is a .NET(C#) implementation of Conditional Random Fields, an machine learning algorithm for learning from labeled sequences of examples. It is widely used in Natural Language Process (NLP) tasks, for example: word breaker, postagging, named entity recognized, query chunking and so on. CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Java Analogical Modeling

    Analogical Modeling module for Java

    Analogical Modeling is an exemplar-based approach to machine learning which imitates human behavior in outcome prediction. Its design has been applied to many natural language and other phenomena which exhibit variable behavior. A Perl XS implementation is available from http://humanities.byu.edu/am/ . This project is a Java implementation of the same. For more information on Analogical Modeling, see http://en.wikipedia.org/wiki/Analogical_modeling .
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    ...Double Layered Learning for Biological Event Extraction from Text. In Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task, Portland, Oregon, June. Association for Computational Linguistic
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    This is a PHP-5 library for language detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    An agent-based situated language learning simulation that focuses on lexical learning and grounding, featuring a unigram syntax structure and a CFG-based semantic grammar. Created as a MSc thesis project, using python.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Reconcile is an open source research platform for coreference resolution. It combines a large number of open source NLP components and provides extension points for researchers to plug in additional features and techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Sanchay
    Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    A unique natural-language processing software, called Discovery, created on the CA Visual Objects/Vulcan.NET environment, which also has potential for effective "shallow approach" machine translation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB