Showing 127 open source projects for "documents"

View related business solutions
  • Auth0 for AI Agents now in GA Icon
    Auth0 for AI Agents now in GA

    Ready to implement AI with confidence (without sacrificing security)?

    Connect your AI agents to apps and data more securely, give users control over the actions AI agents can perform and the data they can access, and enable human confirmation for critical agent actions.
    Start building today
  • Cloud tools for web scraping and data extraction Icon
    Cloud tools for web scraping and data extraction

    Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.

    Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
    Explore 10,000+ tools
  • 1
    PDF Arranger

    PDF Arranger

    Small python-gtk application, to merge or split PDFs

    PDF Arranger is a small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a front end for pikepdf. PDF Arranger is a fork of Konstantinos Poulios’s PDF Shuffler (see Savannah or Sourceforge). It’s a humble attempt to make the project a bit more active.
    Downloads: 449 This Week
    Last Update:
    See Project
  • 2
    JSON-java

    JSON-java

    A reference implementation of a JSON package in Java

    JSON is a light-weight language-independent data interchange format. The JSON-Java package is a reference implementation that demonstrates how to parse JSON documents into Java objects and how to generate new JSON documents from the Java classes.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 3
    WeasyPrint

    WeasyPrint

    The awesome document factory

    WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    QuestPDF

    QuestPDF

    A library that can help you with generating PDF documents

    Quickly design and generate PDF documents with an open-source, modern, and battle-tested C# library. Forget about limitations, feel confident, enjoy your task and efficiently deliver professional products. QuestPDF is a progressive library that can help you with generating PDF documents in your .NET application by offering a friendly, discoverable and predictable C# fluent API.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Easy-to-use Business Software for the Waste Management Software Industry Icon
    Easy-to-use Business Software for the Waste Management Software Industry

    Increase efficiency, expedite accounts receivables, optimize routes, acquire new customers, & more!

    DOP Software’s mission is to streamline waste and recycling business’ processes by providing them with dynamic, comprehensive software and services that increase productivity and quality of performance.
    Learn More
  • 5
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    nbdime

    nbdime

    Tools for diffing and merging of Jupyter notebooks

    nbdime provides tools for diffing and merging Jupyter notebooks. Jupyter notebooks are useful, rich media documents stored in a plain text JSON format. This format is relatively easy to parse. However, primitive line-based diff and merge tools do not handle well the logical structure of notebook documents. nbdime, on the other hand, provides “content-aware” diffing and merging of Jupyter notebooks. It understands the structure of notebook documents.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor and Toolchain written with JavaFX 19

    Asciidoc FX is a WYSIWYG editor for the Asciidoc markup language. You can build PDF, Epub, and HTML books, documents, and slides. Supported Operating Systems and Builds shows the list of available builds with links for reference. If you are looking for the very latest version, visit the link in the note above to be guaranteed of downloading the latest and greatest version of AsciidocFX. AsciidocFX converts documents via the AsciidoctorJ library.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 8
    Symfony DomCrawler

    Symfony DomCrawler

    Eases DOM navigation for HTML and XML documents

    Symfony DomCrawler is a PHP component that provides powerful tools for navigating and extracting data from HTML and XML documents. It allows developers to parse, filter, and manipulate web pages using CSS selectors and XPath expressions. DomCrawler is widely used for web scraping, testing, and processing structured content, and integrates well with other Symfony components like BrowserKit.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    DocTR

    DocTR

    Library for OCR-related tasks powered by Deep Learning

    DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Say goodbye to broken revenue funnels and poor customer experiences Icon
    Say goodbye to broken revenue funnels and poor customer experiences

    Connect and coordinate your data, signals, tools, and people at every step of the customer journey.

    LeanData is a Demand Management solution that supports all go-to-market strategies such as account-based sales development, geo-based territories, and more. LeanData features a visual, intuitive workflow native to Salesforce that enables users to view their entire lead flow in one interface. LeanData allows users to access the drag-and-drop feature to route their leads. LeanData also features an algorithms match that uses multiple fields in Salesforce.
    Learn More
  • 10
    iText Core/Community

    iText Core/Community

    iText for .NET is the .NET version of the iText library

    iText Core/Community (previously known as iTextSharp) is a high-performance, battle-tested library that allows you to create, adapt, inspect, and maintain PDF documents, allowing you to add PDF functionality to your software projects with ease. It is also available for Java. For more advanced examples, refer to our Knowledge Base or the main Examples repo. You can find C# equivalents to the Java Signing examples here, though the Java code is very similar since they have the same API. Some of the output PDF files will be incorrectly displayed by the GitHub previewer, so be sure to download them to see the correct results. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    DevOps Basics

    DevOps Basics

    Practical and document place for DevOps toolchain

    You are new to DevOps or want to learn some DevOps tools, or you are already a DevOps engineer, and you are looking for DevOps documents and a place to practice DevOps tools? This repository will assist you in enhancing your DevOps skills and serve as a bookmark for documents related to DevOps.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Papis

    Papis

    Powerful and highly extensible command-line based document

    Papis is a powerful and highly extensible CLI document and bibliography manager. With Papis, you can search your library for books and papers, add documents and notes, import and export to and from other formats, and much much more. Papis uses a human-readable and easily hackable .yaml file to store each entry's bibliographical data. It strives to be easy to use while providing a wide range of features. And for those who still want more, Papis makes it easy to write scripts that extend its features even further.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    D3.js

    D3.js

    A JavaScript library for visualizing data using web standards

    D3.js (or D3 for Data-Driven Documents) is a JavaScript library that allows you to produce dynamic, interactive data visualizations in web browsers. With D3 you can bring data to life using SVG, Canvas and HTML. Powerful visualization and interaction techniques plus a data-driven approach to DOM manipulation means D3.js gives you greater design freedom and control over the final result.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 14
    iText

    iText

    iText for Java represents the next level of SDKs for developers

    iText for Java represents the next level of SDKs for developers who want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit, and enhance PDF documents, iText can be a boon to nearly every workflow. iText Suite refers to the complete line of products comprising the open-source iText Core PDF library and its add-ons. The iText Suite is a fully-featured SDK for PDF development that allows you to seamlessly embed extensive PDF functionality into your software or workflows. The iText Suite builds on over a decade of lessons learned from iText 5 (and iTextSharp) development. ...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 15
    OpenAPI Generator

    OpenAPI Generator

    OpenAPI Generator allows generation of API client libraries

    ...Some generators support Inversion of Control, allowing you to iterate on design via your OpenAPI document without worrying about blowing away your entire domain layer when you regenerate code. Ever wanted to iteratively design a MySQL database, but writing table declarations was too tedious? OpenAPI documents allow you to convert the metadata about your API into some other format.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Keybase client

    Keybase client

    Keybase Go library, client, service, OS X, iOS, Android, Electron

    ...Keybase works for families, roommates, clubs, and groups of friends, too. Keybase connects to public identities, too. You can connect with communities from Twitter, Reddit, and elsewhere. Don’t live dangerously when it comes to documents. Keybase can store your group’s photos, videos, and documents with end-to-end encryption. You can set a timer on your most sensitive messages. This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    libimobiledevice

    libimobiledevice

    A cross-platform protocol library to communicate with iOS devices

    libimobiledevice is a cross-platform software library that talks the protocols to interact with iOS devices. Unlike other projects, it does not depend on using any existing proprietary libraries and does not require jailbreaking. Access filesystem of a device, access documents of file sharing apps, retrieve information about a device and modify various settings, backup and restore the device in a native way compatible with iTunes. Manage app icons arrangement on the device, install, remove, list and basically manage apps. Activate a device using official servers, manage contacts, calendars, notes and bookmarks, retrieve and remove crashreports. ...
    Downloads: 61 This Week
    Last Update:
    See Project
  • 19
    unioffice

    unioffice

    Pure go library for creating and processing Office Word documents

    unioffice is a library for creation of Office Open XML documents (.docx, .xlsx and .pptx). Its goal is to be the most compatible and highest-performance Go library for the creation and editing of docx/xlsx/pptx files. Every release of our libraries is automatically tested against known vulnerabilities and do not pass unless everything is remediated. All changes are carefully reviewed by our team.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Morphia

    Morphia

    MongoDB object-document mapper in Java

    MongoDB Object Document Mapping for the JVM. Bidirectional mapping to and from the database. Transparently map your Java entities to MongoDB documents and back.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    LangExtract

    LangExtract

    A Python library for extracting structured information

    ...LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. The system excels at handling long documents using optimized chunking, multi-pass extraction, and parallel processing to ensure both high recall and structured consistency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LaTeX Examples

    LaTeX Examples

    Examples for the usage of LaTeX

    LaTeX-examples is a repository collecting a variety of example documents and snippets demonstrating LaTeX features, usage patterns, and common templates. It acts as a playground for learning LaTeX syntax, macros, formatting tricks, and document structuring practices. Files include sample articles, reports, book chapters, presentations (using Beamer), tables, mathematical typesetting examples (equations, aligned systems, integrals, matrices), custom macros, and styling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    PHPWord

    PHPWord

    PHP library for reading and writing word processing documents

    PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats. The current version of PHPWord supports Microsoft Office Open XML (OOXML or OpenXML), OASIS Open Document Format for Office Applications (OpenDocument or ODF), and Rich Text Format (RTF). PHPWord is an open source project licensed under the terms of LGPL version 3. PHPWord is aimed to be a high quality software product by incorporating continuous integration and...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 24
    KnpSnappyBundle

    KnpSnappyBundle

    Easily create PDF and images in Symfony by converting html using webki

    Easily create PDF and images in Symfony by converting HTML using webkit. Snappy is a PHP wrapper for the wkhtmltopdf conversion utility. It allows you to generate either pdf or image files from your html documents, using the webkit engine. The KnpSnappyBundle provides a simple integration for your Symfony project. If you need to change the binaries, change the instance options or even disable one or both services, you can do it through the configuration. Render a pdf document with a relative url inside like css files. Render a pdf document as a response from a controller. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 3 This Week
    Last Update:
    See Project