Showing 284 open source projects for "documents"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Cloud tools for web scraping and data extraction Icon
    Cloud tools for web scraping and data extraction

    Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.

    Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
    Explore 10,000+ tools
  • 1
    e-Dokyumento

    e-Dokyumento

    e-Dokyumento is web-based Document Management System (DMS)

    e-Dokyumento is opensource web-based Document Management System (DMS) A Document Management which automates the basic office document workflow such as receiving, filing, routing, and approving through capturing (scanning), digitizing (OCR Reading), storing, tagging, and electronically routing and approving (e-signature) of electronic documents. # Demo : https://e-dokyumento.herokuapp.com/ https://edokyu.seillig.com/ (refer to Readme.md for the accounts) #Dockerhub: https://hub.docker.com/r/nelsonmaligro/edokyumento # Install using the ISO: 1. Download: https://sourceforge.net/projects/e-dokyumento/files/Releases/e-DokyuV3.iso/download 2. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    OOoPy is a library in Python for inspecting, creating or modifying OpenOffice.org documents. It uses the existing ElementTree XML library by Fredrik Lundh for manipulation of the OOo XML.
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    MITRE Annotation Toolkit

    A toolkit for managing and manipulating text annotations

    The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g., named entity identification, de-identification of medical records). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    ...I feed documents right from the post box into the scanner and then shred them. Perhaps you might find it useful too. Paperless-ng is a fork of the original paperless project. It changes many things both on the surface and under the hood. Paperless-ng was created because I feel that these changes are too big to be pushed into the main repository right away.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 5
    LP CSIC/UAB Apps and Code

    LP CSIC/UAB Apps and Code

    Software and Code from Laboratori de Proteòmica CSIC/UAB

    Software, Code and Documents from Laboratori de Proteòmica CSIC/UAB ( LP-CSIC/UAB: http://proteomica.uab.cat )
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DrQA

    DrQA

    Reading Wikipedia to Answer Open-Domain Questions

    ...It follows a two-stage pipeline: a fast document retriever first narrows down candidate articles, and a neural machine reader then predicts the exact answer span from those passages. The retriever relies on classic IR features (like TF-IDF and n-gram statistics) to remain lightweight and scalable to millions of documents. The reader is a neural model trained on supervised QA data to estimate start and end positions within a paragraph, and it can be adapted to new domains through fine-tuning or distant supervision. The repository includes scripts to build the Wikipedia index, train the reader, and evaluate end-to-end performance. DrQA popularized a practical recipe for combining IR and neural reading, and it remains a strong baseline for open-domain QA research and production prototypes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Liferay Portal

    Liferay Portal

    The world's leading open source portal

    Liferay Portal is the world's leading enterprise open source portal framework, offering integrated Web publishing and content management, an enterprise service bus and service-oriented architecture, and compatibility with all major IT infrastructure. Check GitHub for our latest releases: https://github.com/liferay/liferay-portal/releases https://github.com/liferay/liferay-ide/releases
    Leader badge
    Downloads: 156 This Week
    Last Update:
    See Project
  • 8
    Vector AI

    Vector AI

    A platform for building vector based applications

    Vector AI is a framework designed to make the process of building production-grade vector-based applications as quick and easily as possible. Create, store, manipulate, search and analyze vectors alongside json documents to power applications such as neural search, semantic search, personalized recommendations etc. Image2Vec, Audio2Vec, etc (Any data can be turned into vectors through machine learning). Store your vectors alongside documents without having to do a db lookup for metadata about the vectors. Enable searching of vectors and rich multimedia with vector similarity search. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    ...It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or updated with new crawls. The repository documents practical concerns like HTTP failures, snapshot differences, and stats JSONs, reflecting community use across many languages. While powerful, the repo has been archived and is read-only, so users should expect to run it as-is or fork for maintenance. Even in archived state, issues and releases pages remain useful references for implementation details and dataset lineage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Lightspeed golf course management software Icon
    Lightspeed golf course management software

    Lightspeed Golf is all-in-one golf course management software to help courses simplify operations, drive revenue and deliver amazing golf experiences.

    From tee sheet management, point of sale and payment processing to marketing, automation, reporting and more—Lightspeed is built for the pro shop, restaurant, back office, beverage cart and beyond.
    Learn More
  • 10
    DocBook to LaTeX Publishing transforms your SGML/XML DocBook documents to DVI, PostScript or PDF by translating them in pure LaTeX as a first process. MathML 2.0 markups are supported too. It started as a clone of DB2LaTeX.
    Leader badge
    Downloads: 97 This Week
    Last Update:
    See Project
  • 11
    TimothyDocs

    TimothyDocs

    Timothy is a cloud base storage system designed to document your work

    Timothy is a cloud based documentation system. Timothy will document any endeavor because it will store not only the documents created during the project but also store information about those files. Like most storge schemes timothy creates a hierarchy of categories through which one may browse. Timothy displays information about the document or category as well as its name. This use of meta data explains the structure and content of the project to the user as he browses. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    TexSoup

    TexSoup

    Fault-tolerant Python3 package for searching LaTeX documents

    Navigate, Search, and Modify LaTeX Documents in Python. Easy and reliable: No C extensions, no installation dependencies, and 100% test coverage. TexSoup is a fault-tolerant, Python3 package for searching, navigating, and modifying LaTeX documents. You can skip installation and try TexSoup directly, using the pytwiddle demo.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    GluonNLP

    GluonNLP

    NLP made easy

    GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you load the text data, process the text data, and train models. To facilitate both the engineers and researchers, we provide command-line-toolkits for downloading and processing the NLP datasets. Gluon NLP makes it easy to evaluate and train word embeddings. Here are examples to evaluate the pre-trained embeddings included in the Gluon NLP toolkit as well as example scripts for training...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Clean Thesis

    Clean Thesis

    Clean Thesis is a clean, simple, and elegant LaTeX style (or template)

    Clean, Simple, Elegant Clean Thesis is a LaTeX style for thesis documents, developed for my diploma thesis (Diplomarbeit). The style can be understood as my personal compromise — a typical clean-looking scientific document combined and polished with minor beautification. The design of this Clean Thesis style is inspired by user guide documents from Apple Inc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Unified Sessions Manager

    Unified Sessions Manager

    Pioneering Private and Public Cloud Management since 2008

    The UnifiedSessionsManager supports the integrated management of user sessions within Private-Clouds, comprising heterogeneous IT landscapes of various physical and virtual machines, hypervisor management, and virtual user sessions with remote desktops. Extracted documents see https://sourceforge.net/projects/ctys-doc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Retro

    Retro

    Retro Games in Gym

    RETRO (Retrieval-Enhanced Transformer) is a large language model architecture developed by OpenAI that augments transformer models with a retrieval mechanism. Instead of relying solely on learned parameters, RETRO retrieves relevant documents from a large external database during inference, allowing it to ground responses in external knowledge. This design improves factual accuracy, reduces hallucinations, and enables smaller models to perform comparably to much larger ones by leveraging retrieval. The repository provides code and resources for training and evaluating RETRO models, along with infrastructure for integrating retrieval into the transformer pipeline. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17

    TestRest

    TestRest is a fully QA Management Tool

    TestRest is Test Management offers test case authoring, reusable test cases, test execution and reporting. TestRest supports statistic and graph reports with simple modern UI interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    pysourceinfo

    pysourceinfo

    RTTI for Python Source and Binary Files

    ...The covered objects include packages, modules, functions, methods, scripts, and classes by two views: - File System View - packages, modules, and linenumbers - based on files and paths - Runtime Object View - callables, classes, and containers - based on in-memory RTTI / introspection The supported platforms are: - Linux, BSD, Unix, OS-X, Cygwin, and Windows - Python2, Python3 - CPython, PyPy Object addresses within modules - Object Identifier OID - and the display of the runtime call flow are supported by 'PyStackInfo'. Online documents: https://pysourceinfo.sourceforge.io/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    hug

    hug

    Embrace the APIs of the future. For developing APIs

    ...As a result, it drastically simplifies Python API development. Make developing a Python-driven API as succinct as a written definition. The framework should encourage code that self-documents. It should be fast. A developer should never feel the need to look somewhere else for performance reasons. Writing tests for APIs written on-top of hug should be easy and intuitive. Magic done once, in an API framework, is better than pushing the problem set to the user of the API framework. Be the basis for next-generation Python APIs, embracing the latest technology.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    pybag

    pybag

    Crossplatform files synchronization and backup portable tool.

    PYBAG implements a portable bag and is intended for fast synchronization and backup. It lets you use a portable digital storage device to carry your electronic documents similar to the way you can use a bag to carry paper documents. You can synchronize the bag with your original files easily. If a synchronization conflict occurs, it will be reported. You can specify rules for automatic conflict resolution. With PYBAG, you can backup files and synchronize any changes made to the original files with the bag. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    GRAMD® Personal Signature

    GRAMD® Personal Signature

    Digital Signature for PDF documents in Spanish

    Eng: Digital Signature with x.509 certificates and smartcards for PDFand PAdES format documents in Spanish for Windows OS (8 and 10). Esp: Firma Digital de documents electrónicos PDF en formato PAdES con certificados digitales X.509 y tokens criptográficos en español.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    Safe Harbor Deidentification

    Safe Harbor Deidentification for medical documents

    Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    pytkapp

    Python package for develop an SDI/MDI application and set of widgets.

    =========== pytkapp =========== Python package for develop an application that provide multi-documents/single-document interface with using of tkinter library and set of additional tkinter widgets. See available demos: pytkapp/demo/run_ptaoptionsdemo.py - run GUI demo for option's container (available widgets, rules) pytkapp/demo/run_ptamdidemo.py - run demo of MDI application pytkapp/demo/run_ptasdidemo.py - run demo of SDI application pytkapp/demo/run_tkwbasicdemo.py - run demo of basic widgets pytkapp/demo/run_tkwtldemo.py - run demo of tablelist-based widgets pytkapp/demo/run_diademo.py - run demo for dialog widgets (selector, xmessage) Notes ========= 1) PyTkApp package was tested on python 2.7, 3.1 2) If you planning to use tablelist-based widgets then you need to download tcl Tablelist package from http://www.nemethi.de/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. ...
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Finetune Transformer LM

    Finetune Transformer LM

    Code for "Improving Language Understanding by Generative Pre-Training"

    finetune-transformer-lm is a research codebase that accompanies the paper “Improving Language Understanding by Generative Pre-Training,” providing a minimal implementation focused on fine-tuning a transformer language model for evaluation tasks. The repository centers on reproducing the ROCStories Cloze Test result and includes a single-command training workflow to run the experiment end to end. It documents that runs are non-deterministic due to certain GPU operations and reports a median accuracy over multiple trials that is slightly below the single-run result in the paper, reflecting expected variance in practice. The project ships lightweight training, data, and analysis scripts, keeping the footprint small while making the experimental pipeline transparent. ...
    Downloads: 0 This Week
    Last Update:
    See Project