Showing 84 open source projects for "document analysis"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Paperless-AI

    Paperless-AI

    AI-powered document analysis and tagging for Paperless-ngx

    Paperless-AI is an AI-powered extension designed to enhance document management within Paperless-ngx by automating analysis, classification, and organization tasks. It continuously monitors incoming documents and processes them using various AI backends, enabling automatic assignment of titles, tags, document types, and correspondents. It integrates with multiple OpenAI-compatible services as well as local models, giving users flexibility in how document intelligence is handled. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    ...The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. The architecture is designed to be lightweight and easily deployable, making it suitable for both local installations and cloud environments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Elasticsearch MCP Server

    Elasticsearch MCP Server

    A Model Context Protocol (MCP) server implementation

    This MCP server implementation provides interaction capabilities with Elasticsearch and OpenSearch, enabling functionalities such as document searching, index analysis, and cluster management through a set of tools. ​
    Downloads: 7 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 5
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 6
    Docling

    Docling

    Get your documents ready for gen AI

    Docling is an open-source document processing toolkit built to prepare diverse content types for modern generative AI and data workflows. The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    tidytext

    tidytext

    Text mining using tidy tools

    tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ExtractThinker

    ExtractThinker

    ExtractThinker is a Document Intelligence library for LLMs

    ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    ...The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    HiClaw

    HiClaw

    An open source collaborative multi-agent OS

    HiClaw is an AI-powered legal assistant framework developed within the AgentScope ecosystem to support intelligent legal reasoning, document analysis, and workflow automation for legal research and compliance tasks. The project combines large language models with agent orchestration systems to process legal documents, interpret regulations, summarize contracts, and assist with legal knowledge retrieval. It is designed to provide structured, explainable workflows that help legal professionals interact with AI systems more transparently and efficiently. hiclaw emphasizes modularity and integration with external legal databases, retrieval systems, and enterprise tooling, enabling customization for different legal domains and operational requirements. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient handling of large datasets. It supports multiple extraction strategies for different document formats, balancing accuracy and throughput depending on the use case. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Generative AI Use Cases (GenU)

    Generative AI Use Cases (GenU)

    Application implementation with business use cases

    ...Each example typically includes infrastructure templates, backend services, and application code that show how to integrate generative AI capabilities with other AWS services. These examples cover tasks such as document analysis, conversational assistants, content generation, and knowledge retrieval systems. The repository is intended to serve as both a learning resource and a starting point for developers who want to deploy generative AI solutions using AWS infrastructure.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    LongBench

    LongBench

    LongBench v2 and LongBench (ACL 25'&24')

    ...LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across multiple tasks. The benchmark includes multiple categories such as single-document question answering, multi-document reasoning, summarization, long dialogue understanding, and code analysis. It supports bilingual evaluation in English and Chinese to assess multilingual capabilities across extended contexts. Newer versions of the benchmark introduce extremely long context windows ranging from thousands to millions of tokens, enabling researchers to test the limits of modern long-context models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    chatd

    chatd

    Chat with your documents using local AI

    chatd is an open-source desktop application that allows users to interact with their documents through a locally running large language model. The software focuses on privacy and security by ensuring that all document processing and inference occur entirely on the user’s computer without sending data to external cloud services. It includes a built-in integration with the Ollama runtime, which provides a cross-platform environment for running large language models locally. The application...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Lemon AI

    Lemon AI

    Full-stack Open-source Self-Evolving General AI Agent

    LemonAI is an open-source full-stack framework for building autonomous AI agents capable of performing complex tasks such as research, programming, data analysis, and document processing. The platform is designed to run primarily on local infrastructure, providing a privacy-focused alternative to cloud-dependent agent platforms. It integrates with local large language models through tools such as Ollama, vLLM, and other model runtimes while also allowing optional connections to external cloud models. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    QAnything

    QAnything

    Question and Answer based on Anything

    QAnything is a local knowledge-base question-answering system designed to let users ask questions over many kinds of files and databases. It supports offline installation, making it useful for organizations that need private document analysis without sending data to external services. Users can upload local files and receive fast, reliable answers based on the indexed content. The system supports formats such as PDF, Word, PowerPoint, Excel, Markdown, email, text, images, CSV, and web links. Its retrieval process uses a two-stage vector and reranking approach to maintain answer quality as the knowledge base grows. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Semantra

    Semantra

    Multi-tool for semantic search

    ...The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. The system runs from the command line and automatically launches a local web interface where users can perform interactive searches and examine document passages related to a query. By relying on semantic embeddings and contextual analysis, the tool can identify passages that are relevant even when the query uses different wording than the source documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    GPT Academic

    GPT Academic

    Research-oriented chatbot framework

    GPT Academic is a research-oriented chatbot framework designed to integrate large language models (LLMs) into academic workflows. It provides tools for structured document processing, citation management, and enhanced interaction with research papers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    NativeMind Extension

    NativeMind Extension

    Your fully private, open-source, on-device AI assistant

    NativeMindExtension is an open-source browser extension that provides a private, on-device AI assistant designed to run without cloud dependencies. The project is built around a privacy-first model in which conversations, document analysis, translations, and writing assistance stay on the user’s device rather than being sent to external servers. It integrates with local model back ends such as Ollama and also supports WebLLM for quick in-browser trials, giving users a choice between stronger local setups and lighter no-install demonstrations. The extension is aimed at everyday browser workflows, offering features like multi-tab context awareness, webpage summarization, document understanding, contextual toolbars, and AI-assisted rewriting directly inside the browsing experience. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    PaperAI

    PaperAI

    Semantic search and workflows for medical/scientific papers

    PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Claude Skills

    Claude Skills

    Public repository for Agent Skills

    ...Rather than relying on handcrafted prompts every time, Skills teach an AI agent procedural knowledge and task-specific workflows so it can apply that expertise reliably, whether the task involves document creation, data analysis, design generation, or technical automation. Each Skill lives in its own directory with a SKILL.md file containing metadata and instructions, and can include supplemental scripts or assets that the agent uses to perform complex operations when relevant.
    Downloads: 61 This Week
    Last Update:
    See Project
  • 23
    DocStrange

    DocStrange

    Extract and convert data from any document, images, pdfs, word doc

    DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and Mixtral, making it a flexible tool for anyone needing advanced document analysis and AI-driven conversation in a secure, local setup.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Large Language Models (LLMs)

    Large Language Models (LLMs)

    Connect MATLAB to LLM APIs, including OpenAI® Chat Completions

    This repository enables MATLAB to connect with large language models (LLMs) such as OpenAI's ChatGPT, DALL-E, Azure OpenAI, and Ollama, integrating their natural language processing and image generation capabilities directly within MATLAB environments. It facilitates creating chatbots, summarizing text, and image generation, among other tasks.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
Auth0 Logo