Python Linguistics Software

View 2699 business solutions

Browse free open source Python Linguistics Software and projects below. Use the toggles on the left to filter open source Python Linguistics Software by OS, license, language, programming language, and project status.

  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Argos Translate

    Argos Translate

    Open-source offline translation library written in Python

    Argos Translate uses OpenNMT for translations and can be used as either a Python library, command-line, or GUI application. Argos Translate supports installing language model packages which are zip archives with a ".argosmodel" extension containing the data needed for translation. LibreTranslate is an API and web-app built on top of Argos Translate. Argos Translate also manages automatically pivoting through intermediate languages to translate between languages that don't have a direct translation between them installed. For example, if you have a es → en and en → fr translation installed you are able to translate from es → fr as if you had that translation installed. This allows for translating between a wide variety of languages at the cost of some loss of translation quality.
    Downloads: 176 This Week
    Last Update:
    See Project
  • 2
    iramuteq
    IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
    Leader badge
    Downloads: 944 This Week
    Last Update:
    See Project
  • 3
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 4

    Presage

    the intelligent predictive text entry platform

    Presage (formerly Soothsayer) is an intelligent predictive text entry system. Presage generates predictions by modelling natural language as a combination of redundant information sources. Presage computes probabilities for words which are most likely to be entered next by merging predictions generated by the different predictive algorithms. Presage's modular and extensible architecture allows its language model to be extended and customized to utilize statistical, syntactic, and semantic predictive algorithms. Presage's predictive capabilities are implemented by predictive plugins. Predictive plugins use services provided by the platform to implement multiple prediction techniques.
    Leader badge
    Downloads: 284 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    Mishkal: Arabic Text Vocalization

    Mishkal: Arabic Text Vocalization

    Arabic Text Vocalization system

    Automatic system of vocalization of arabic text.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 6
    UnsupervisedMT

    UnsupervisedMT

    Phrase-Based & Neural Unsupervised Machine Translation

    Unsupervised Machine Translation is a research repository that implements both phrase-based SMT and neural MT approaches for translation without parallel corpora. The neural component supports multiple architectures—seq2seq, biLSTM with attention, and Transformer—and allows extensive parameter sharing across languages to improve data efficiency. Training relies on denoising auto-encoding and back-translation, with on-the-fly, multithreaded generation of synthetic parallel data to continually refresh supervision signals. The project also provides scripts to fetch and preprocess monolingual data, learn BPE codes, and train cross-lingual embeddings that bootstrap unsupervised alignment between languages. Beyond the core EMNLP 2018 setup, the codebase exposes additional, optional capabilities such as multi-language training, language model pretraining with shared parameters, and adversarial training.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    CycleGAN and pix2pix in PyTorch

    CycleGAN and pix2pix in PyTorch

    Image-to-Image Translation in PyTorch

    CycleGAN and pix2pix in PyTorch repository is a PyTorch implementation of two influential image-to-image translation frameworks: CycleGAN (for unpaired translation) and pix2pix (for paired translation). This repo gives developers and researchers a convenient, modern (PyTorch-based) platform to train and test these methods — supporting both paired datasets (input to output) and unpaired datasets (domain-to-domain) with minimal changes. The code supports standard training and inference pipelines, and as of recent updates, compatibility with the latest Python and PyTorch versions (e.g. Python 3.11, PyTorch 2.4) as well as support for distributed/multi-GPU training for scalable workflows. Because of its flexibility, users can apply it to many tasks: e.g. style transfer between domains (e.g. season changes, art-to-photo, etc.), mapping sketches/edges to real images, image colorization, day-to-night, photo enhancement, and more.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Alfanous

    Alfanous

    Quran Search Engine

    Alfanous (The Lantern - الفانوس ) is an Arabic search engine API provide the simple and advanced search in the Holy Quran , more features and many interfaces...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9

    MITRE Annotation Toolkit

    A toolkit for managing and manipulating text annotations

    The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g., named entity identification, de-identification of medical records). The goal of MAT is not to help you configure your training engine (in the default case, the Carafe CRF system) to achieve the best possible performance on your data. MAT is for "everything else": all the tools you end up wishing you had.
    Downloads: 12 This Week
    Last Update:
    See Project
  • AI-First Supply Chain Management Icon
    AI-First Supply Chain Management

    Supply chain managers, executives, and businesses seeking AI-powered solutions to optimize planning, operations, and decision-making across the supply

    Logility is a market-leading provider of AI-first supply chain management solutions engineered to help organizations build sustainable digital supply chains that improve people’s lives and the world we live in. The company’s approach is designed to reimagine supply chain planning by shifting away from traditional “what happened” processes to an AI-driven strategy that combines the power of humans and machines to predict and be ready for what’s coming. Logility’s fully integrated, end-to-end platform helps clients know faster, turn uncertainty into opportunity, and transform the supply chain from a cost center to an engine for growth.
    Learn More
  • 10
    Helsinki Finite-State Technology
    The Helsinki Finite-State Transducer toolkit is intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    SPPAS

    SPPAS

    SPPAS - the automatic annotation and analyses of speech

    SPPAS is a scientific computer software package written and maintained by Brigitte Bigi of the Laboratoire Parole et Langage, in Aix-en-Provence, France. Available for free, with open source code, there is simply no other package for linguists to simple use in the automatic annotations of speech, the analyses of any kind of annotated data and the conversion of annotated files. SPPAS is able to produce automatically speech annotations from a recorded speech sound and its orthographic transcription. SPPAS is helpful for the analysis of any annotated data: estimate statistical distributions, make requests, manage files, visualize annotations. SPPAS offers a file converter from/to a wide range of formats: xra, TextGrid, eaf, trs... <https://sppas.org>
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    WordCount

    WordCount

    Count frequency of single, 2-word and 3-word clusters in a text

    The program can read a text file and count the occurrences of single words and clusters of 2 and 3 words. The resulting list will be sorted in descending order (highest frequency on top).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary. More useful references to check: ------------------------------------------- https://sites.google.com/site/mouradabbas9/corpora
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Kana no quiz is a little educational tool to memorize the transcription and pronunciation of Japanese kana (katakana & hiragana), presented as a quiz. It is written in Python and uses a GTK+ interface for a nice cross-paltform rendering!
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Arramooz Alwaseet Arabic Dictionary
    Arramooz Alwaseet Open Arabic Dictionary for morphological analyze. To be useful for Arabic language processing. This dictionary is derived from the Ayaspell Arabic spell checker.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    TextTools
    TextTools is a freeware corpus linguistics tool developed in Python to aid in research. This program analyzes user-created corpora and displays information about word (token) frequency, n-grams, clusters, collocations, keyword in context (KWIC), and keyness. TextTools is designed to be user-friendly and intuitive and will run natively on Mac OS X.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Melete is a program to apply sound changes to a lexicon, simulating historical linguistic sound changes. It supports a flexible and easy to use format for specifying sounds, characters and diacritics, support for phonological features, and more.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Part-of-speech tagging is the task of assigning symbols from a particular set to words in a natural language text. ACOPOST implements and extends well-known machine learning techniques and provides a uniform environment for testing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Aelius Brazilian Portuguese POS-Tagger

    Python, NLTK-based package for shallow parsing of Brazilian Portuguese

    Aelius is an ongoing open source project aiming at developing a suite of Python, NLTK-based modules and interfaces to external freely available tools for shallow parsing of Brazilian Portuguese. It also includes language resources such as language models, sample texts, and gold standards. Presently, Aelius already offers facilities for POS-tagging and chunking corpora and outputting annotations in different formats, such as in XML in the TEI P5 encoding scheme.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    AsiEs stands for Asistente de Escritura (writing assistant). It provides word prediction and autocomplete for fast writing. Thought for people with difficulties writing on keyboard, improves the writing speed preventing the user from pressing at most 50% of keys to write and avoids ortographic errors. Made by Fundación Teletón Uruguay (http://www.teleton.org.uy/home/)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    Automatic Compound Processing (AuCoPro)

    Automatic compound splitting and semantic analysis of compounds

    The central problem to be addressed in this project concerns a multidisciplinary (linguistics and computational linguistics) investigation into sharing of knowledge and resources between closely-related languages, specifically relating to the automatic processing of compounds. Specifically, we will explore the possibility to create new knowledge about closely-related languages, and efficiently develop additional, more advanced resources for (a) compound segmentation; and (b) the semantic analysis of compounds; as such, the project will be divided into two interrelated subprojects, to be executed simultaneously. The focus in this project will be on Afrikaans (with Dutch as the closely-related, well-sourced language), which will lay grounds for future work on other closely-related language pairs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Color to Word

    Color to Word

    Turn colors into words

    The program will turn a color into a list of 10 words, obtained according to a custom designed algorithm based on letter shape and position in the alphabet. - Click inside the frame on the left to pick a color through the color chooser window - The program will match the color with the colors corresponding to a list of all the English words contained in the file wordcolor.txt - The first 10 matches will appear in the frame on the right - Right-click - Copy to copy the word matches and the RGB values This version comes with a text file (wordcolor.txt) containing all the English words followed by Red, Green, Blue channel values for the corresponding color. The colors were obtained through a modified version of the program "Text to Color" by same author, available for download on GitHub and SourceForge on the profile page of Fonazza-Stent. The next version (coming soon) will include a tool to convert a custom word list into a word+color list named wordcolor.txt
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    CompE Toolkit

    Data Type Converter

    CompE Toolkit allows the user to seamlessly convert between binary, decimal, hexadecimal, and 32-bit floating point representation. It uses a simple, user-friendly interface designed for maximum efficiency and minimal clutter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next