Showing 2020 open source projects for "documents"

View related business solutions
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    Keybase client

    Keybase client

    Keybase Go library, client, service, OS X, iOS, Android, Electron

    ...Keybase works for families, roommates, clubs, and groups of friends, too. Keybase connects to public identities, too. You can connect with communities from Twitter, Reddit, and elsewhere. Don’t live dangerously when it comes to documents. Keybase can store your group’s photos, videos, and documents with end-to-end encryption. You can set a timer on your most sensitive messages. This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    rga

    rga

    rga: ripgrep, but also search in PDFs, E-Books, Office documents, etc.

    rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in PDF, docx, sqlite, JPG, movie subtitles (mkv, mp4), etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    iText

    iText

    iText for Java represents the next level of SDKs for developers

    iText for Java represents the next level of SDKs for developers who want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit, and enhance PDF documents, iText can be a boon to nearly every workflow. iText Suite refers to the complete line of products comprising the open-source iText Core PDF library and its add-ons. The iText Suite is a fully-featured SDK for PDF development that allows you to seamlessly embed extensive PDF functionality into your software or workflows. The iText Suite builds on over a decade of lessons learned from iText 5 (and iTextSharp) development. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 4
    txtai

    txtai

    Build AI-powered semantic search applications

    ...Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 5
    minted

    minted

    minted is a LaTeX package that provides syntax highlighting

    minted is a LaTeX package that enables advanced syntax highlighting of source code using the Pygments library. It supports customization via LaTeX and Python integration, allowing fine-grained control over code snippets in documents.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Handcalcs

    Handcalcs

    Python library for converting Python calculations into rendered latex

    Handcalcs is a Python library that auto-renders calculation code in Jupyter notebooks or LaTeX documents with step-by-step symbolic substitution, giving output a “handwritten” feel. It supports cell magics and auto-LaTeX generation via configurable output options.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Papermark

    Papermark

    Papermark is the open-source DocSend alternative

    Papermark is an open-source document-sharing platform that serves as an alternative to services like DocSend. It allows users to share documents securely with built-in analytics and custom domain support. Papermark is designed for ease of use and can be self-hosted, providing full control over document distribution and tracking.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    libimobiledevice

    libimobiledevice

    A cross-platform protocol library to communicate with iOS devices

    libimobiledevice is a cross-platform software library that talks the protocols to interact with iOS devices. Unlike other projects, it does not depend on using any existing proprietary libraries and does not require jailbreaking. Access filesystem of a device, access documents of file sharing apps, retrieve information about a device and modify various settings, backup and restore the device in a native way compatible with iTunes. Manage app icons arrangement on the device, install, remove, list and basically manage apps. Activate a device using official servers, manage contacts, calendars, notes and bookmarks, retrieve and remove crashreports. ...
    Downloads: 44 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Google Workspace MCP Server

    Google Workspace MCP Server

    Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms

    ...By acting as a bridge between AI clients and the Google ecosystem, the server enables automated workflows such as searching emails, creating calendar events, retrieving documents, or editing files without leaving the AI environment. The system is designed to operate as a backend service that integrates with AI applications such as coding agents, automation tools, and conversational assistants. Authentication is handled through OAuth-based flows that allow both single-user and multi-user environments while maintaining access control over Workspace data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    PageLM

    PageLM

    PageLM is a community driven version of NotebookLM

    ...It supports uploaded documents including PDF, DOCX, Markdown, and TXT, allowing users to ground questions and generated materials in source content. On the technical side, it supports multiple model providers, multiple embedding back ends, WebSocket streaming for real-time generation, persistent content storage, and structured markdown outputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SemTools

    SemTools

    Semantic search and document parsing tools for the command line

    ...The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead. SemTools can parse documents, build semantic embeddings, and perform similarity searches across datasets, making it useful for research, knowledge management, and AI-assisted coding workflows. The toolkit is designed to work well with modern AI pipelines, particularly those involving large language models that require structured knowledge retrieval.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GJSON

    GJSON

    Get JSON values quickly, JSON parser for Go

    GJSON is a Go library designed for extremely fast, allocation-free retrieval of values from JSON documents. It enables you to query nested JSON structures using one-liner dot-notation or array-based paths and includes wildcard and comparison operators. The library is optimized for speed and zero allocations, benchmarking significantly faster than Go’s standard encoding/json unmarshal approaches. It supports parsing JSON lines (newline-delimited JSON) as an array for large stream processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Papers We Love

    Papers We Love

    Papers from the computer science community to read and discuss

    Papers We Love (PWL) is a global open source community dedicated to reading, discussing, and sharing influential computer science research papers. The repository serves as a curated directory of academic papers that have shaped the field of computing, providing a centralized location for documents that were previously scattered across various online sources. While licensing restrictions prevent hosting all papers directly, PWL offers links to their original sources and clearly marks hosted copies with an emoji. The community encourages participation through local meetups, where members gather to discuss and analyze key works spanning topics such as programming languages, distributed systems, and software engineering. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Hagenberg Thesis Document Collection

    Hagenberg Thesis Document Collection

    Hagenberg LaTeX Thesis Template

    This is a collection of modern LaTeX classes, style files, and example documents for authoring Bachelor, Master, or Diploma theses and related academic manuscripts in English and German. Pre-configured English and German documents are available, easy to use even for LaTeX beginners, and compatible with LaTeX distributions for Windows, Mac OS, and Linux. The document classes are immediately usable and convenient to customize.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    yq

    Portable command-line YAML processor

    yq is a portable and lightweight command-line YAML processor. It can be likened to projects like jq (a command-line JSON processor) or sed but for YAML files. yq is able to do quite a number of things. It can deep read a YAML file with a given path expression, deeply compare YAML files, update a YAML file given a path expression or script file, and so much more. It can also merge several YAML files while offering plenty of options for overriding and appending. yq is written in portable...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 18
    FSNotes

    FSNotes

    Notes manager for macOS/iOS

    FSNotes is a modern notes manager for macOS and iOS. App respects open formats like GitHub Flavored Markdown, so you can easily write documents on iPhone and MacBook. It's simple and blazing fast! Sync via iCloud Drive. 3D Touch and configurable keyboard. TextBundle and EncryptedTextBundle containers. Pinned notes are kept in sync with the desktop app. Dynamic fonts (iOS 11+). Night mode by location or screen brightness. Sharing extension. Encrypted note support.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    knitr

    knitr

    A general-purpose tool for dynamic report generation in R

    knitr is an R package that acts as a literate programming engine, combining code execution and document generation. It executes code embedded in Markdown, LaTeX, or other formats and produces output with results interleaved into final documents. It powers R Markdown and supports caching, chunk options, graphics, and extensibility for reproducible analysis.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    audio_video_streaming

    audio_video_streaming

    Compilation of authoritative information on audio and video streaming

    ...The repository includes example implementations like multi-user video chat systems, WebRTC demos, and cross-platform media players to provide hands-on learning opportunities. It also documents widely used technologies such as RTP, RTMP, HLS, and WebRTC, helping users understand the full lifecycle of streaming pipelines from capture to rendering. In addition to educational materials, it references industry tools, libraries, and frameworks, making it a valuable roadmap for both beginners and advanced engineers. The project emphasizes structured learning paths and practical experimentation,
    Downloads: 10 This Week
    Last Update:
    See Project
  • 21
    Concordia

    Concordia

    Crowdsourcing platform for full text transcription and tagging

    Concordia is a platform for crowdsourcing transcription and tagging of text in digitized images. It was developed by the Library of Congress so that volunteers of all backgrounds could transcribe and tag digitized images of manuscripts and typed materials from the Library’s collections that could not otherwise be done by optical character recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LLaMA 3

    LLaMA 3

    The official Meta Llama 3 GitHub site

    ...As the Llama stack evolved, Meta consolidated repositories and marked this one deprecated, pointing users to newer, centralized hubs for models, utilities, and docs. Even as a deprecated repo, it documents the transition path and preserves references that clarify how Llama 3 releases map into the current ecosystem. Practically, it functioned as a bridge between Llama 2 and later Llama releases by standardizing distribution and starter code for inference and fine-tuning. Teams still treat it as historical reference material for version lineage and migration notes.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 23
    supabase-swift

    supabase-swift

    A Swift client for Supabase

    This reference documents every object and method available in Supabase's Swift library, supabase-swift. You can use supabase-swift to interact with your Postgres database, listen to database changes, invoke Deno Edge Functions, build login and user management functionality, and manage large files.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Talk to Figma MCP

    Talk to Figma MCP

    AI bridge enabling Cursor agents to read and modify Figma designs

    cursor-talk-to-figma-mcp is an open source integration that connects AI coding agents with Figma through the Model Context Protocol (MCP). It allows AI tools such as Cursor or other compatible agents to directly communicate with Figma documents and interact with design elements programmatically. Through this integration, an AI assistant can read the structure of a design, retrieve information about nodes or selections, and perform modifications to the layout or content. cursor-talk-to-figma-mcp includes an MCP server and a Figma plugin that communicate through a WebSocket connection, enabling real-time interaction between the AI environment and the design canvas. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    LEANN

    LEANN

    Local RAG engine for private multimodal knowledge search on devices

    ...By recomputing embeddings during queries and using compact graph-based indexing structures, LEANN can maintain high search accuracy while minimizing disk usage. It aims to act as a unified personal knowledge layer that connects different types of data such as documents, code, images, and other local files into a searchable context for language models.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB