Showing 24 open source projects for "open document"

View related business solutions
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 3
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Semantra

    Semantra

    Multi-tool for semantic search

    Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    huggingface_hub

    huggingface_hub

    The official Python client for the Huggingface Hub

    The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Shapash

    Shapash

    Explainability and Interpretability to Develop Reliable ML models

    Shapash is a Python library dedicated to the interpretability of Data Science models. It provides several types of visualization that display explicit labels that everyone can understand. Data Scientists can more easily understand their models, share their results and easily document their projects in an HTML report. End users can understand the suggestion proposed by a model using a summary of the most influential criteria.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Llama Cookbook

    Llama Cookbook

    Solve end to end problems using Llama model family

    The Llama Cookbook is the official Meta LLaMA guide for inference, fine‑tuning, RAG, and multi-step use-cases. It offers recipes, code samples, and integration examples across provider platforms (WhatsApp, SQL, long context workflows), enabling developers to quickly harness LLaMA models
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    flair

    flair

    A very simple framework for state-of-the-art NLP

    A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Deepnote

    Deepnote

    Deepnote is a drop-in replacement for Jupyter

    ...The system supports programming languages such as Python, R, and SQL and allows users to execute and analyze data directly within interactive notebooks. Deepnote emphasizes team-based data science by enabling real-time collaboration similar to shared document editors, allowing multiple users to work simultaneously on the same notebook environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    The Machine & Deep Learning Compendium

    The Machine & Deep Learning Compendium

    List of references in my private & single document

    The Machine & Deep Learning Compendium is an open-source knowledge repository that compiles summaries, references, and learning materials related to machine learning and deep learning. The project functions as a comprehensive compendium that organizes hundreds of topics covering algorithms, frameworks, research areas, and practical machine learning workflows. Originally created as a personal knowledge base, the repository evolved into a public educational resource designed to help learners...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PyTextRank

    PyTextRank

    Python implementation of TextRank algorithms

    PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work -- and related knowledge graph practices.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    LayoutParser

    LayoutParser

    A Unified Toolkit for Deep Learning Based Document Image Analysis

    ...But it still easy to install layoutparser, and we designed the installation method in a way such that you can choose to install only the needed dependencies for your project. LayoutParser is also a open platform that enables the sharing of layout detection models and DIA pipelines among the community.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    Hugging Face Transformer

    Hugging Face Transformer

    CPU/GPU inference server for Hugging Face transformer models

    Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    NLP-progress

    NLP-progress

    Repository to track the progress in Natural Language Processing (NLP)

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks. This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets. It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MatchZoo

    MatchZoo

    Facilitating the design, comparison and sharing of deep text models

    The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use. Preprocess your input data in three lines of code, keep track parameters to be passed into the model. Make use of MatchZoo...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    SLING

    SLING

    A natural language frame semantics parser

    The aim of the SLING project is to learn to read and understand Wikipedia articles in many languages for the purpose of knowledge base completion, e.g. adding facts mentioned in Wikipedia (and other sources) to the Wikidata knowledge base. We use frame semantics as a common representation for both knowledge representation and document annotation. The SLING parser can be trained to produce frame semantic representations of text directly without any explicit intervening linguistic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AI learning

    AI learning

    AiLearning, data analysis plus machine learning practice

    We actively respond to the Research Open Source Initiative (DOCX) . Open source today is not just open source, but datasets, models, tutorials, and experimental records. We are also exploring other categories of open source solutions and protocols. I hope you will understand this initiative, combine this initiative with your own interests, and do what you can. Everyone's tiny contributions, together, are the entire open source ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Scattertext 0.2.1

    Scattertext 0.2.1

    Beautiful visualizations of how language differs among document types

    A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding to terms are selectively labeled so that they don't overlap with other labels or points.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Convolution arithmetic

    Convolution arithmetic

    A technical report on convolution arithmetic in deep learning

    A technical report on convolution arithmetic in the context of deep learning. The code and the images of this tutorial are free to use as regulated by the licence and subject to proper attribution. The animations will be output to the gif directory. Individual animation steps will be output in PDF format to the pdf directory and in PNG format to the png directory. We introduce a guide to help deep learning practitioners understand and manipulate convolutional neural network architectures....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    libcrn is document image processing library written in C++11 for Linux, Windows, Mac OsX and Google Android. It is a toolbox that allows to create easily software such as OCRs and layout analysis tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    jLDADMM

    A Java package for the LDA and DMM topic models

    The Java package jLDADMM is released to provide alternative choices for topic modeling on normal or short texts. It provides implementations of the Latent Dirichlet Allocation topic model and the one-topic-per-document Dirichlet Multinomial Mixture model (i.e. mixture of unigrams), using collapsed Gibbs sampling. In addition, jLDADMM supplies a document clustering evaluation to compare topic models. See the usage of jLDADMM in its website at http://jldadmm.sourceforge.net/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB