Showing 347 open source projects for "document"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • 1
    Haystack

    Haystack

    Haystack is an open source NLP framework to interact with your data

    Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! Make use of and compare the latest pre-trained transformer-based languages models like OpenAI’s GPT-3, BERT, RoBERTa, DPR, and more. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2
    Llama Cookbook

    Llama Cookbook

    Solve end to end problems using Llama model family

    The Llama Cookbook is the official Meta LLaMA guide for inference, fine‑tuning, RAG, and multi-step use-cases. It offers recipes, code samples, and integration examples across provider platforms (WhatsApp, SQL, long context workflows), enabling developers to quickly harness LLaMA models
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    pydna

    pydna

    Clone with Python! Data structures for double stranded DNA

    ...Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning. Planning genetic constructs with many parts and assembly steps, such as recombinant metabolic pathways, are often difficult to properly document as is evident from the poor state of documentation in the scientific literature. The pydna python package provide a human-readable formal description of cloning and genetic assembly strategies in Python which allow for simulation and verification. Pydna can be used as executable documentation for cloning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MechanicalSoup

    MechanicalSoup

    A Python library for automating interaction with websites

    ...Unfortunately, Mechanize was incompatible with Python 3 until 2019 and its development stalled for several years. MechanicalSoup provides a similar API, built on Python giants Requests (for HTTP sessions) and BeautifulSoup (for document navigation). Since 2017 it is a project actively maintained by a small team.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 5
    PySpur

    PySpur

    Visual tool for building, testing, and deploying AI agent workflows

    PySpur is a visual development environment designed to help AI engineers build, test, and iterate on agent-based workflows more efficiently. It provides a structured playground where users can define test cases, construct agents either through Python code or a graphical interface, and continuously refine their behavior. It addresses common challenges in AI agent development such as prompt tuning difficulties and lack of visibility into workflow execution. By offering a visual representation...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    FlagEmbedding

    FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    FlagEmbedding is an open-source toolkit for building and deploying high-performance text embedding models used in information retrieval and retrieval-augmented generation systems. The project is part of the BAAI FlagOpen ecosystem and focuses on creating embedding models that transform text into dense vector representations suitable for semantic search and large language model pipelines. FlagEmbedding includes a family of models known as BGE (BAAI General Embedding), which are designed to...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    BERTopic

    BERTopic

    Leveraging BERT and c-TF-IDF to create easily interpretable topics

    BERTopic is a topic modeling technique that leverages transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions. BERTopic supports guided, supervised, semi-supervised, manual, long-document, hierarchical, class-based, dynamic, and online topic modeling. It even supports visualizations similar to LDAvis! Corresponding medium posts can be found here, here and here. For a more detailed overview, you can read the paper or see a brief overview. After having trained our BERTopic model, we can iteratively go through hundreds of topics to get a good understanding of the topics that were extracted. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    MedicalGPT

    MedicalGPT

    MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training

    MedicalGPT training medical GPT model with ChatGPT training pipeline, implementation of Pretraining, Supervised Finetuning, Reward Modeling and Reinforcement Learning. MedicalGPT trains large medical models, including secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    InfiAgent

    InfiAgent

    Build your own Cowork, AI Scientist and other SoTA Agents

    ...Designed as a “Multi-Level Agent” (MLA) system, it externalizes persistent state to the file system so that agents can operate over unlimited runtime without the need for token-intensive context compression, enabling workflows such as research paper drafting, experiments, coding, and document generation to run reliably. The framework uses a serial multi-agent hierarchy where specialized agents coordinate in tree-structured paths for clear task delegation and minimal tool conflicts, while batch file operations and persistent workspaces ensure reproducibility and traceability. It aims to solve real-world challenges in long-horizon reasoning and execution, offering configuration-driven customization so that users can define domain-specific agents like research assistants.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    myGPTReader

    myGPTReader

    AI Slack bot for reading, summarizing, and chatting with content

    myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    ModernBERT

    ModernBERT

    Bringing BERT into modernity via both architecture changes and scaling

    ModernBERT is an open-source research project that modernizes the classic BERT encoder architecture by incorporating recent advances in transformer design, training techniques, and efficiency improvements. The goal of the project is to bring BERT-style models up to date with the capabilities of modern large language models while preserving the strengths of bidirectional encoder architectures used for tasks such as classification, retrieval, and semantic search. ModernBERT introduces...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    Dolphin — maintained by ByteDance — is a project aimed at providing a high-performance, robust, and extensible media or multimedia framework / player infrastructure (or possibly a streaming media solution), intended to meet modern demands for efficiency, flexibility, and integration in media-heavy applications. It seeks to combine performant media playback or handling (audio/video decoding, streaming, buffering) with a modular, developer-friendly API that allows easy embedding into larger...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    flair

    flair

    A very simple framework for state-of-the-art NLP

    ...Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings and various transformers. A PyTorch NLP framework. Our framework builds directly on PyTorch, making it easy to train your own models and experiment with new approaches using Flair embeddings and classes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Paper2Slides

    Paper2Slides

    From Paper to Presentation in One Click

    Paper2Slides is an automation tool that converts research papers, reports, and other documents into polished slide decks and posters with minimal manual effort. It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    OpenMed

    OpenMed

    Open source healthcare AI

    ...OpenMed can be used in three main ways: as a simple Python API for scripts and notebooks, as a Docker-friendly FastAPI service for backend integration, and as a batch-processing system for multi-document workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Pathway AI Pipelines

    Pathway AI Pipelines

    Ready-to-run cloud templates for RAG

    Pathway AI Pipelines is a collection of ready-to-deploy AI pipeline templates designed to help developers rapidly build production-grade retrieval-augmented generation and enterprise search applications. The project provides end-to-end examples that connect live data sources to LLM workflows, enabling applications to stay synchronized with continuously changing information. It supports numerous connectors including local files, Google Drive, SharePoint, Kafka, PostgreSQL, and real-time APIs,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    SimpleMem

    SimpleMem

    SimpleMem: Efficient Lifelong Memory for LLM Agents

    ...Unlike monolithic systems where memory management is ad-hoc, SimpleMem formalizes a memory lifecycle—write, index, retrieve, refine—so applications can handle user history, document collections, or dynamic contextual state systematically. It supports customizable embedding models, efficient vector indexes, and relevance weighting, making it practical for building assistants, personal agents, or domain-specific retrieval systems that need persistent knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    NeMo Curator

    NeMo Curator

    Scalable data pre processing and curation toolkit for LLMs

    NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    ...With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document. The single request/response API that allows you to build network request, perform it and work with the received content. The API is built on top of urllib3 and lxml libraries. The Spider API to build asynchronous web crawlers. You write classes that define handlers for each type of network request. Each handler is able to spawn new network requests. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Lambda Builders

    Lambda Builders

    Python library to compile, build & package AWS Lambda functions

    ...A build action is a module that knows how to build for a particular programming language & framework (ex: Python+PIP). Build actions can be implemented in Python or in the native programming language. Each build action has its own design document.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Tongyi DeepResearch

    Tongyi DeepResearch

    Tongyi Deep Research, the Leading Open-source Deep Research Agent

    ...The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and reinforcement learning; supports benchmarks like web search, document understanding, question answering, “agentic” tasks; provides inference tools, evaluation scripts, and “web agent” style interfaces. The aim is to enable more autonomous, agentic models that can perform sustained knowledge gathering, reasoning, and synthesis across multiple modalities (web, files, etc.).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    SAG

    SAG

    SQL-Driven RAG Engine

    SAG is an open-source SQL-driven retrieval-augmented generation engine that dynamically constructs knowledge graphs during query processing. Instead of relying on a static knowledge graph prepared in advance, the system automatically builds relational structures between entities while processing user queries. Documents are first decomposed into atomic semantic events, which are then represented using multidimensional natural language vectors. These vectors allow the system to identify...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    AppAgent

    AppAgent

    Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

    AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    LLM TLDR

    LLM TLDR

    95% token savings. 155x faster queries. 16 languages

    LLM TLDR is a tool that leverages large language models (LLMs) to generate concise, coherent summaries (TL;DRs) of long documents, articles, or text files, helping users quickly understand large amounts of content without reading every word. It integrates with LLM APIs to handle input texts of varying lengths and complexity, applying techniques like chunking, context management, and multi-pass summarization to preserve accuracy even when the source is very large. The system supports both...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB