Showing 554 open source projects for "language processing"

View related business solutions
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    PostgresML

    PostgresML

    The GPU-powered AI application database

    ...Combine and automate the entire workflow from embedding generation to indexing and querying for the simplest (and fastest) knowledge-based chatbot implementation. Leverage multiple types of natural language processing and machine learning models such as vector search and personalization with embeddings to improve search results. Leverage your data with time series forecasting to garner key business insights. Build statistical and predictive models with the full power of SQL and dozens of regression algorithms. Return results and detect fraud faster with ML at the database layer. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ...Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    MetaScreener

    MetaScreener

    AI-powered tool for efficient abstract and PDF screening

    ...The platform can analyze both abstracts and full PDF documents, enabling automated filtering based on research criteria defined by the user. By incorporating natural language processing techniques, the system can identify potentially relevant studies and reduce the workload associated with manual screening.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    BambooAI

    BambooAI

    A Python library powered by Language Models (LLMs)

    BambooAI is a Python library powered by large language models (LLMs) for conversational data discovery and analysis, allowing users to interact with data through natural language.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    AWS Toolkit for Visual Studio Code

    AWS Toolkit for Visual Studio Code

    Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.

    ...It shows a top-level view of your CDK applications that have been synthesized in your workspace. Amazon CodeWhisperer provides inline code suggestions using machine learning and natural language processing on the contents of your current file. Supported languages include Java, Python and Javascript.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Pluely

    Pluely

    The Open Source Alternative to Cluely

    Pluely is an open-source AI automation framework designed to simplify the development and deployment of AI-driven workflows across applications and services. The system focuses on orchestrating tasks performed by large language models and other AI components, allowing developers to define structured workflows where models interact with tools, APIs, and external systems. By providing a modular architecture for building AI pipelines, the platform enables developers to connect multiple processing steps such as data retrieval, prompt execution, analysis, and response generation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    WeKnora

    WeKnora

    LLM framework for document understanding and semantic retrieval

    WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware responses. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    tidytext

    tidytext

    Text mining using tidy tools

    tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Superlinked

    Superlinked

    Superlinked is a Python framework for AI Engineers

    Superlinked is a Python framework designed for AI engineers to build high-performance search and recommendation applications that combine structured and unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • 10
    DataDreamer

    DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models

    DataDreamer is a tool designed to assist in the generation and manipulation of synthetic data for various applications, including testing and machine learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    LightAutoML

    LightAutoML

    Fast and customizable framework for automatic ML model creation

    LightAutoML is an automated machine learning (AutoML) framework optimized for efficient model training and hyperparameter tuning, focusing on both tabular and text data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MegaParse

    MegaParse

    File Parser optimised for LLM Ingestion with no loss

    MegaParse is a file parser optimized for Large Language Model (LLM) ingestion, ensuring no loss of information. It efficiently parses various document formats, such as PDFs, DOCX, and PPTX, converting them into formats ideal for processing by LLMs. This tool is essential for applications that require accurate and comprehensive data extraction from diverse document types.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    amazon-connect-wisdomjs

    amazon-connect-wisdomjs

    Gives you the power to build your own Wisdom widget

    ...With Wisdom, agents can search across connected repositories to find answers and quickly resolve customer issues. In addition, Wisdom uses real-time speech analytics and natural language processing (NLP) from Contact Lens for Amazon Connect to detect customer issues during calls, and then provide agents with recommendations and answers. Wisdom provides faster issue resolution and improved customer satisfaction.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    diff2html

    diff2html

    Pretty diff to html javascript library (diff2html)

    ...We work hard to make sure you can have your diffs in a simple and flexible way. The AI community building the future. Build, train and deploy state of the art models powered by the reference open source in natural language processing. Wrapper and helper adding syntax highlight, synchronized scroll, and other nice features. You can use it without syntax highlight or by passing your own implementation with the languages you prefer. Diff2Html can be used in various ways as listed in the distributions section.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    TaxHacker

    TaxHacker

    Self-hosted AI accounting app. LLM analyzer for receipts

    TaxHacker is an open-source, self-hosted accounting application that uses artificial intelligence to automate financial record management for freelancers, independent developers, and small businesses. The system is designed to simplify bookkeeping by automatically processing financial documents such as receipts, invoices, and transaction records. It integrates large language models to analyze these documents, extract relevant financial information, and categorize expenses or income based on configurable rules. Users can deploy the application on their own infrastructure, ensuring that financial data remains private and under their control rather than being processed by external services. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    WeClone

    WeClone

    One-stop solution for creating your digital avatar from chat history

    WeClone is an open source AI project designed to replicate a person’s conversational style and personality by training models on chat history data. The system analyzes message patterns, linguistic style, and contextual behavior in order to generate responses that resemble the original user’s communication style. It is intended primarily as an experimental exploration of digital personality modeling and conversational AI personalization. By processing large volumes of conversation data,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Megatron

    Megatron

    Ongoing research training transformer models at scale

    Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This repository is for ongoing research on training large transformer language models at scale. We developed efficient, model-parallel (tensor, sequence, and pipeline), and multi-node pre-training of transformer based models such as GPT, BERT, and T5 using mixed precision. Megatron is also used in NeMo Megatron, a framework to help enterprises overcome the challenges of building and training sophisticated natural language processing models with billions and trillions of parameters. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    mergekit

    mergekit

    Tools for merging pretrained large language models

    ...The library is designed to operate efficiently even in environments with limited hardware resources by using memory-efficient processing methods that can run entirely on CPUs. It also provides configuration-driven workflows that allow users to experiment with different merging strategies without modifying source code.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Milvus Bootcamp

    Milvus Bootcamp

    Dealing with all unstructured data, such as reverse image search

    Milvus Bootcamp is a collection of tutorials, examples, and best practices for using Milvus, an open-source vector database designed for AI-powered similarity search and retrieval applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    SparseML

    SparseML

    Libraries for applying sparsification recipes to neural networks

    SparseML is an optimization toolkit for training and deploying deep learning models using sparsification techniques like pruning and quantization to improve efficiency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    MoBA

    MoBA

    MoBA: Mixture of Block Attention for Long-Context LLMs

    MoBA, short for Mixture of Block Attention, is an open-source research implementation of a novel attention mechanism designed to improve the efficiency of large language models processing extremely long contexts. The architecture adapts ideas from Mixture-of-Experts networks and applies them directly to the attention mechanism of transformer models. Instead of forcing each token to attend to every other token in the sequence, MoBA divides the context into blocks and dynamically routes queries to only the most relevant segments of information. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Lemon AI

    Lemon AI

    Full-stack Open-source Self-Evolving General AI Agent

    LemonAI is an open-source full-stack framework for building autonomous AI agents capable of performing complex tasks such as research, programming, data analysis, and document processing. The platform is designed to run primarily on local infrastructure, providing a privacy-focused alternative to cloud-dependent agent platforms. It integrates with local large language models through tools such as Ollama, vLLM, and other model runtimes while also allowing optional connections to external cloud models. The system includes a multi-agent architecture that supports planning, action execution, reflection, and memory, allowing the agent to reason through tasks and refine results iteratively. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Bolt NLP

    Bolt NLP

    Bolt is a deep learning library with high performance

    Bolt is a high-performance deep learning inference framework developed by Huawei Noah's Ark Lab. It is designed to optimize and accelerate the deployment of deep learning models across various hardware platforms. Bolt is a light-weight library for deep learning. Bolt, as a universal deployment tool for all kinds of neural networks, aims to automate the deployment pipeline and achieve extreme acceleration. Bolt has been widely deployed and used in many departments of HUAWEI company, such as...
    Downloads: 7 This Week
    Last Update:
    See Project