Showing 346 open source projects for "document"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    footswitch3

    footswitch3

    Audio Transcription software for Linux (Gstreamer) with a foot pedal

    Footswitch 3 is a media player for transcribers on Linux. Written in python using the python bindings for Gstreamer it allows a transcriber to control the audio or video with a foot pedal, and includes a set of macros that integrate into LibreOffice. This allows the transcriber to control the media player from within Libreoffice as well, making it useful for those who do not yet own a foot pedal/foot switch. Control of the media player from LibreOffice can be via Hotkeys or an integrated...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    footswitch2basic

    footswitch2basic

    Audio Transcription software for Linux (Vlc) with a foot pedal

    Footswitch 2 (Basic) is a media player for transcribers on Linux. This version is a stripped down version of Footswitch2, containing only the absolute essentials for transcription. Written in python and using the python bindings for VLC it allows a transcriber to control the audio or video with a footpedal, and includes a set of macros that integrate into LibreOffice. This allows the transcriber to control the media player from within Libreoffice as well, making it useful for those who do...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    footswitch3basic

    footswitch3basic

    Audio Transcription software for Linux (Gstreamer) with a foot pedal

    Footswitch3basic is a media player for transcribers on Linux. Written in python using the bindings for Gstreamer it allows a transcriber to control the audio or video with a foot pedal, and includes a set of macros that integrate into LibreOffice. This allows the transcriber to control the media player from within Libreoffice as well, making it useful for those who do not yet own a foot pedal/foot switch. Control of the media player from LibreOffice can be via Hotkeys or an integrated...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    OpenOffice.org Utility Library

    Library modules for creating ODF documents.

    OpenOffice.org Utility Library modules for creating Open Document Format (ODF) documents which can be read by Office Suites including OpenOffice.org, LibreOffice.org, and Microsoft Office. Currently, ooolib-python can create Calc spreadsheet ODS documents. These documents include many features including: - Create multiple table spreadsheets - Cells with text, numbers, dates, formulas - Ability to use built-in styles - Ability to create automatic styles (ie. bold, italics, underline, font size, font color, background color, etc...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    ...Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper can process new pages and return the extracted data in structured formats such as dictionaries or lists. This approach simplifies web scraping tasks by shifting the focus from rule-writing to example-based training. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6

    SimpleTextFormatter

    STF automatically generates documentation

    STF is a system of automatically generating documentation under control of a program or a script. It is frequently used to automatically generate test reports. STF is also used to clean up the output of a process and turn it into a nice looking report.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7

    openpublipostage

    Édition automatisée de conventions de stage.

    Logiciel en Python permettant l'automatisations de l'édition de conventions de stage à partir d'un document en .doc ou .docx et d'une base de données en .xlsx.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LaTeX Cookbook

    LaTeX Cookbook

    A comprehensive LaTeX template with examples for theses, books, etc.

    This repo contains a LaTeX document, usable as a cookbook (different "recipes" to achieve various things in LaTeX) as well as a template. The resulting PDF covers LaTeX-specific topics and instructions on compiling the LaTeX source.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    LayoutParser

    LayoutParser

    A Unified Toolkit for Deep Learning Based Document Image Analysis

    With the help of state-of-the-art deep learning models, Layout Parser enables extracting complicated document structures using only several lines of code. This method is also more robust and generalizable as no sophisticated rules are involved in this process. A complete instruction for installing the main Layout Parser library and auxiliary components. Learn how to load DL Layout models and use them for layout detection. The full list of layout models currently available in Layout Parser. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    e-Dokyumento

    e-Dokyumento

    e-Dokyumento is web-based Document Management System (DMS)

    e-Dokyumento is opensource web-based Document Management System (DMS) A Document Management which automates the basic office document workflow such as receiving, filing, routing, and approving through capturing (scanning), digitizing (OCR Reading), storing, tagging, and electronically routing and approving (e-signature) of electronic documents. # Demo : https://e-dokyumento.herokuapp.com/ https://edokyu.seillig.com/ (refer to Readme.md for the accounts) #Dockerhub: https://hub.docker.com/r/nelsonmaligro/edokyumento # Install using the ISO: 1. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Rank-BM25

    Rank-BM25

    A Collection of BM25 Algorithms in Python

    A collection of algorithms for querying a set of documents and returning the ones most relevant to the query. The most common use case for these algorithms is, as you might have guessed, to create search engines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Spyne

    Spyne

    A transport agnostic sync/async RPC library

    ...In other words, Spyne is a framework for building distributed solutions that strictly follow the MVC pattern, where Model = spyne.model, View = spyne.protocol and Controller = user code. Spyne comes with the implementations of popular transport, protocol and interface document standards along with a well-defined API that lets you build on existing functionality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Hugging Face Transformer

    Hugging Face Transformer

    CPU/GPU inference server for Hugging Face transformer models

    Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Interpret-Text

    Interpret-Text

    State-of-the-art explainers for text-based machine learning models

    ...Users can run their experiments across multiple state-of-the-art explainers and easily perform comparative analysis on them. Using these tools, users will be able to explain their machine-learning models globally on each label or locally for each document.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    itamm

    itamm

    Tool to design and share enterprise solutions, services and processes

    The tool is for people who design, analyze, optimize and develop processes, services and solution architectures. IT(A)-MM is a tool to design models of solutions, services and enterprise processes. It allows you to visualize data using popular BPMN and ArchiMate visualization notation. It also has its own extensible notation for visualizing enterprise environment objects. IT(A)-MM is easy to use and allows you to use it wherever you are. Using IT(A)-MM can be the first step towards deploy...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DrQA

    DrQA

    Reading Wikipedia to Answer Open-Domain Questions

    DrQA is an open-domain question answering system that reads large text corpora—famously Wikipedia—to answer natural language questions with extractive spans. It follows a two-stage pipeline: a fast document retriever first narrows down candidate articles, and a neural machine reader then predicts the exact answer span from those passages. The retriever relies on classic IR features (like TF-IDF and n-gram statistics) to remain lightweight and scalable to millions of documents. The reader is a neural model trained on supervised QA data to estimate start and end positions within a paragraph, and it can be adapted to new domains through fine-tuning or distant supervision. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    nbdev template

    nbdev template

    Template for nbdev projects

    Write, test, document, and distribute software packages and technical articles, all in one place, your notebook. Traditional programming environments throw away the result of your exploration in REPLs or notebooks. nbdev makes exploration an integral part of your workflow, all while promoting software engineering best practices.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Vector AI

    Vector AI

    A platform for building vector based applications

    Vector AI is a framework designed to make the process of building production-grade vector-based applications as quick and easily as possible. Create, store, manipulate, search and analyze vectors alongside json documents to power applications such as neural search, semantic search, personalized recommendations etc. Image2Vec, Audio2Vec, etc (Any data can be turned into vectors through machine learning). Store your vectors alongside documents without having to do a db lookup for metadata...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Synonyms

    Synonyms

    Chinese synonyms, chat robot, intelligent question and answer toolkit

    Chinese Synonyms for natural language processing and understanding. Better Chinese synonyms, chatbot, intelligent question and answer toolkit. synonymsCan be used for many tasks in natural language understanding, text alignment, recommendation algorithms, similarity calculation, semantic shifting, keyword extraction, concept extraction, automatic summarization, search engines, etc. Print synonyms in a friendly way for easy debugging. "Synonyms Cilin" was compiled by Mei Jiaju and others in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    TimothyDocs

    TimothyDocs

    Timothy is a cloud base storage system designed to document your work

    ...Users make better informed decisions about where to file a document when this information is available to them. It also suggests where to look for a long lost document.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Clean Thesis

    Clean Thesis

    Clean Thesis is a clean, simple, and elegant LaTeX style (or template)

    Clean, Simple, Elegant Clean Thesis is a LaTeX style for thesis documents, developed for my diploma thesis (Diplomarbeit). The style can be understood as my personal compromise — a typical clean-looking scientific document combined and polished with minor beautification. The design of this Clean Thesis style is inspired by user guide documents from Apple Inc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Retro

    Retro

    Retro Games in Gym

    RETRO (Retrieval-Enhanced Transformer) is a large language model architecture developed by OpenAI that augments transformer models with a retrieval mechanism. Instead of relying solely on learned parameters, RETRO retrieves relevant documents from a large external database during inference, allowing it to ground responses in external knowledge. This design improves factual accuracy, reduces hallucinations, and enables smaller models to perform comparably to much larger ones by leveraging...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    NLP-progress

    NLP-progress

    Repository to track the progress in Natural Language Processing (NLP)

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks. This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets. It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    jieba

    jieba

    Stuttering Chinese word segmentation

    "Jaba" Chinese word segmentation, do the best Python Chinese word segmentation component. Four word segmentation modes are supported. Precise mode, which tries to cut the sentence most precisely, suitable for text analysis. Full mode, scans all the words that can be formed into words in the sentence, the speed is very fast, but the ambiguity cannot be resolved. The search engine mode, on the basis of the precise mode, divides the long words again to improve the recall rate, which is suitable...
    Downloads: 8 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB