Showing 368 open source projects for "text processing"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Obsei

    Obsei

    Obsei is a low code AI powered automation tool

    Obsei is an automated no-code/low-code AI-powered text observation and analysis framework, designed for extracting insights from unstructured text data such as social media, reviews, and logs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Wikipedia2Vec

    Wikipedia2Vec

    A tool for learning vector representations of words and entities

    Wikipedia2Vec is an embedding learning tool that creates word and entity vector representations from Wikipedia, enabling NLP models to leverage structured and contextual knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    YAYI

    YAYI

    Repo for YaYi Chinese LLMs based on LlaMA2 & BLOOM

    YAYI is an open-source large language model project developed to provide a multilingual conversational AI system capable of performing a wide variety of natural language processing tasks. The model is trained on diverse datasets covering multiple languages and domains so that it can support applications ranging from dialogue systems to text analysis and knowledge retrieval. The architecture is based on transformer-style language models optimized for conversational understanding and generation. In addition to producing coherent responses, the system is designed to handle tasks such as summarization, translation, question answering, and text classification. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    GPT-2 Output Dataset

    GPT-2 Output Dataset

    Dataset of GPT-2 outputs for research in detection, biases, and more

    The GPT-2 Output Dataset is a large collection of model-generated text, released by OpenAI alongside the GPT-2 research paper to study the behaviors and limitations of large language models. It contains 250,000 samples of GPT-2 outputs, generated with different sampling strategies such as top-k truncation, to highlight the diversity and quality of model completions. The dataset also includes corresponding human-written text for comparison, enabling researchers to explore methods for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    ...Towhee includes a pythonic method-chaining API for describing custom data processing pipelines. We also support schemas, making processing unstructured data as easy as handling tabular data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CSAw - NLP for low-resource languages

    CSAw - NLP for low-resource languages

    CSAw is an NLP framework for low-resource languages

    CSAw is an NLP framework for low-resource languages with a focus on machine translation. The primary goal is to build language models automatically from bilingual text (e.g., front and back translations) using a deep transfer rule-based approach. The core of this strategy is the Concept Specification and Abstraction semantic representation which is specially designed with machine translation in mind. See the preprint article here: https://arxiv.org/abs/1807.02226 The current...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Lingua-Go

    Lingua-Go

    The most accurate natural language detection library for Go

    Lingua-Go is a Golang implementation of the Lingua language detection library, providing efficient and accurate language identification for Go-based applications. Its task is simple: It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine-learning frameworks or natural language processing applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Companionem Linguae

    Ultra Large Language Model

    Companionem Linguae is an ultra large language model in early stages of development. Companionem Linguae is being reworked. I have uploaded a part of the new training data (Latein.json). Although English is the standard for international communication, I decided to train the model with Latin and German first, because these languages have many grammatical features the English language doesn't have, but they could be useful for translations into French, Portuguese, Spanish, or other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    UniEM

    UniEM

    Unified embedding model

    UniEM is a unified embedding model designed to create high-quality text embeddings for various natural language processing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Medusa

    Medusa

    Framework for Accelerating LLM Generation with Multiple Decoding Heads

    Medusa is a framework aimed at accelerating the generation capabilities of Large Language Models (LLMs) by employing multiple decoding heads. This approach allows for parallel processing during text generation, significantly enhancing throughput and reducing response times. Medusa is designed to be simple to implement and integrates with existing LLM infrastructures, making it a practical solution for scaling LLM applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    ekho

    ekho

    Chinese text-to-speech engine

    ekho is a project with relatively sparse documentation, but from the repository it appears to be a small-scale tool for audio processing and playback, possibly with features for speech synthesis or manipulation. The repo includes scripts and configuration files suggesting interactions with media/audio handling libraries. Because of limited README detail, it seems targeted at users comfortable reading and modifying code, rather than end users expecting polished UIs. The code structure implies...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    Promptify

    Promptify

    se GPT or other prompt based models to get structured output

    Promptify is an open-source Python library designed to simplify prompt engineering and the development of natural language processing pipelines using large language models. The project provides tools that help developers generate structured prompts for different NLP tasks and apply them across multiple generative AI systems. Instead of manually crafting prompts for each task, Promptify introduces a unified architecture that combines prompt templates, language model interfaces, and processing pipelines into a single framework. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 16
    SOD

    SOD

    An Embedded Computer Vision & Machine Learning Library

    SOD is an embedded, modern cross-platform computer vision and machine learning software library that expose a set of APIs for deep-learning, advanced media analysis & processing including real-time, multi-class object detection and model training on embedded systems with limited computational resource and IoT devices. SOD was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in open source as well as commercial products....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Botpress

    Botpress

    Dev tools to reliably understand text and automate conversations

    We make building chatbots much easier for developers. We have put together the boilerplate code and infrastructure you need to get a chatbot up and running. We propose you a complete dev-friendly platform that ships with all the tools you need to build, deploy and manage production-grade chatbots in record time. Built-in Natural Language Processing tasks such as intent recognition, spell checking, entity extraction, and slot tagging (and many others). A visual conversation studio to design...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 18
    Prime QA

    Prime QA

    State-of-the-art Multilingual Question Answering research

    PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the Transformers toolkit and uses datasets and models that are directly...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    SentimentAnalysis-Rick&Morty

    SentimentAnalysis-Rick&Morty

    Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

    The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. Through the extraction of information from textual data, it becomes possible to identify and comprehend the sentiments and emotions conveyed. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    textacy

    textacy

    NLP, before and after spaCy

    textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals, tokenization, part-of-speech tagging, dependency parsing, etc., delegated to another library, textacy focuses primarily on the tasks that come before and follow after.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    picoGPT

    picoGPT

    An unnecessarily tiny implementation of GPT-2 in NumPy

    ...Developers can explore the repository to understand how language models generate text and how transformer components interact within the architecture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Transformers-Interpret

    Transformers-Interpret

    Model explainability that works seamlessly with Hugging Face

    Transformers-Interpret is an interpretability tool for Transformer-based NLP models, providing insights into attention mechanisms and feature importance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ChatGPT Advanced

    ChatGPT Advanced

    Browser extension adding web search results to ChatGPT prompts easily

    chatgpt-advanced, commonly known as WebChatGPT, is an open source browser extension designed to enhance the capabilities of ChatGPT by integrating real-time web search results into user prompts. It works by intercepting queries submitted to the ChatGPT interface and optionally augmenting them with information gathered from search engines before sending the prompt to the chatbot. This approach allows the model to generate responses that are more current and contextually relevant compared to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    T81 558

    T81 558

    Applications of Deep Neural Networks

    ...Application of these architectures to computer vision, time series, security, natural language processing (NLP), and data generation will be covered. High-Performance Computing (HPC) aspects will demonstrate how deep learning can be leveraged both on graphical processing units (GPUs), as well as grids.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    VnCoreNLP

    VnCoreNLP

    A Vietnamese natural language processing toolkit

    VnCoreNLP is a Java-based natural language processing toolkit tailored for Vietnamese. It offers a fast and accurate pipeline for essential NLP tasks, facilitating research and application development in Vietnamese language processing. ​
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB