Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "documents" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Linux 249
Windows 232
Mac 194
More...
BSD 120
ChromeOS 86
Desktop Operating Systems 8
Mobile Operating Systems 8
Embedded Operating Systems 2
Game Consoles 2
Server Operating Systems 2

Category

Artificial Intelligence 77
Software Development 66
Business 58
Internet 32
Formats and Protocols 31
Scientific/Engineering 24
Text Editors 22
System 21
Multimedia 16
Communications 12
Education 11
Database 7
Printing 7
Security 5
Desktop Environment 4
Games 2
Blockchain 1
Religion and Philosophy 1

License

OSI-Approved Open Source 258
Creative Commons Attribution License 8
Other License 6
Public Domain 2

Translations

Programming Language

Python 285
JavaScript 19
C 15
C++ 15
Java 12
More...
Unix Shell 7
Perl 4
Zope 4
XSL (XSLT/XPath/XSL-FO) 3
BASIC 2
PHP 2
Visual Basic 2
Yacc 2
ActionScript 1
AWK 1
C# 1
Delphi/Kylix 1
Flex 1
Free Pascal 1
Groovy 1
IDL 1
JSP 1
Lazarus 1
Object Pascal 1
PowerShell 1
Ruby 1
Tcl 1
TypeScript 1
VBScript 1

Status

Production/Stable 64
Beta 41
Alpha 34
Pre-Alpha 16
More...
Planning 13
Mature 5
Inactive 2

Showing 285 open source projects for "documents"

View related business solutions

Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Fully managed relational database service for MySQL, PostgreSQL, and SQL Server
Focus on your application, and leave the database to us

Cloud SQL manages your databases so you don't have to, so your business can run without disruption. It automates all your backups, replication, patches, encryption, and storage capacity increases to give your applications the reliability, scalability, and security they need.

Try for free
1

DeepSeek-OCR

Contexts Optical Compression

...It supports local deployment, enabling organizations concerned about privacy or latency to run the pipeline on-premises rather than send sensitive documents to third-party cloud services. The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.

Downloads: 4 This Week

Last Update: 2025-10-25
See Project
2

RAG API

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.

Downloads: 8 This Week

Last Update: 2025-12-11
See Project
3

Qwen3

Qwen3 is the large language model series developed by Qwen team

Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K tokens. It delivers higher quality and more helpful text generation across multiple languages and domains, including mathematics, coding, science, and tool usage. Various quantized versions,...

1 Review

Downloads: 69 This Week

Last Update: 2026-01-09
See Project
4

yq JSON

Command-line YAML, XML, TOML processor

Before using yq, you also have to install its dependency, jq. See the jq installation instructions for details and directions specific to your platform. On macOS, yq is also available on Homebrew use brew install python-yq.

Downloads: 7 This Week

Last Update: 2024-04-27
See Project
Reach Your Audience with Rise Vision, the #1 Cloud Digital Signage Software Solution
K-12 Schools, Higher Education, Businesses, Restaurants

Rise Vision is the #1 digital signage company, offering easy-to-use cloud digital signage software compatible with any player across multiple screens. Forget about static displays. Save time and boost sales with 500+ customizable content templates for your screens. If you ever need help, get free training and exceptionally fast support.

Learn More
5

txtai

Build AI-powered semantic search applications

...Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. ...

Downloads: 2 This Week

Last Update: 2025-12-15
See Project
6

Jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts

Have you always wished Jupyter notebooks were plain text documents? Wished you could edit them in your favorite IDE? And get clear and meaningful diffs when doing version control? Then, Jupytext may well be the tool you’re looking for. Only the notebook inputs (and optionally, the metadata) are included. Text notebooks are well suited for version control. You can also edit or refactor them in an IDE - the .py notebook above is a regular Python file.

Downloads: 0 This Week

Last Update: 11 hours ago
See Project
7

LinkChecker

Check links in web documents or full websites

LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub Packages.

Downloads: 0 This Week

Last Update: 2025-07-28
See Project
8

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

...Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. The project provides code, pretrained weights, and inference instructions, making it feasible to deploy locally or on a server, and to integrate with applications.

Downloads: 4 This Week

Last Update: 6 days ago
See Project
9

xhtml2pdf

A library for converting HTML into PDFs using ReportLab

xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.

Downloads: 7 This Week

Last Update: 2025-02-23
See Project
Turn traffic into pipeline and prospects into customers
For account executives and sales engineers looking for a solution to manage their insights and sales data

Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.

Learn More
10

Zerox OCR

PDF to Markdown with vision models

A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.

Downloads: 5 This Week

Last Update: 2024-12-18
See Project
11

RSS to Telegram Bot

A Telegram RSS bot that cares about your reading experience

A Telegram RSS bot that cares about your reading experience.

Downloads: 8 This Week

Last Update: 2025-03-22
See Project
12

PDFMathTranslate

PDF scientific paper translation with preserved formats

PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.

Downloads: 6 This Week

Last Update: 2025-07-11
See Project
13

Handcalcs

Python library for converting Python calculations into rendered latex

Handcalcs is a Python library that auto-renders calculation code in Jupyter notebooks or LaTeX documents with step-by-step symbolic substitution, giving output a “handwritten” feel. It supports cell magics and auto-LaTeX generation via configurable output options.

Downloads: 3 This Week

Last Update: 2025-11-03
See Project
14

py-pdf-parser

A Python tool to help extracting information from structured PDFs

py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents.

Downloads: 4 This Week

Last Update: 2025-04-28
See Project
15

GLM-4.6V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

...Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and can output or act via tools seamlessly, bridging perception and execution. Its architecture supports a very large context window (on the order of 128K tokens during training), which lets it handle complex multimodal inputs like long documents, multi-page reports, or video transcripts, while maintaining coherence across extended content. In benchmarks and internal evaluations, GLM-4.6V achieves state-of-the-art (SoTA) performance among models of comparable parameter scale on multimodal reasoning.

Downloads: 3 This Week

Last Update: 2025-12-18
See Project
16

Concordia

Crowdsourcing platform for full text transcription and tagging

Concordia is a platform for crowdsourcing transcription and tagging of text in digitized images. It was developed by the Library of Congress so that volunteers of all backgrounds could transcribe and tag digitized images of manuscripts and typed materials from the Library’s collections that could not otherwise be done by optical character recognition.

Downloads: 0 This Week

Last Update: 2025-12-30
See Project
17

Tongyi DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

DeepResearch (Tongyi DeepResearch) is an open-source “deep research agent” developed by Alibaba’s Tongyi Lab designed for long-horizon, information-seeking tasks. It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and reinforcement learning; supports benchmarks like web search, document understanding, question answering, “agentic” tasks; provides inference tools, evaluation scripts, and “web agent” style interfaces. ...

Downloads: 1 This Week

Last Update: 7 days ago
See Project
18

LangExtract

A Python library for extracting structured information

...LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. The system excels at handling long documents using optimized chunking, multi-pass extraction, and parallel processing to ensure both high recall and structured consistency.

Downloads: 0 This Week

Last Update: 2025-11-27
See Project
19

Remarkable for Linux

The Markdown Editor for Linux

...With MathJax support you can render beautiful, rich documents with advanced formatting. Keyboard shortcuts enable maximum productivity.

Downloads: 0 This Week

Last Update: 2024-09-22
See Project
20

Krixik

Documentation for the Krixik Python client

Small/specialized AI models are an oft-necessary complement—or alternative—to "big AI" offerings. However, infrastructure for small AI tends to be underwhelming, so building with specialized AI can be difficult, time-consuming, and even expensive. Iterating with different models, and particularly with different combinations of these models, can thus be rendered unfeasible.

Downloads: 0 This Week

Last Update: 2024-11-05
See Project
21

Onyx

Gen-AI Chat for Teams

Onyx is an AI platform designed to integrate seamlessly with your company's documents, applications, and team members. It offers a feature-rich chat interface and supports integration with various Large Language Models (LLMs). Onyx ensures synchronized knowledge and access controls across over 40 connectors, including Google Drive, Slack, Confluence, and Salesforce. Users can create custom AI agents with unique prompts and actions, and deploy Onyx securely on various platforms, from laptops to cloud services.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
22

malware-samples

A collection of malware samples and relevant dissection information

This repo is a public collection of malware samples and related dissection/analysis information, maintained by InQuest. It gathers various kinds of malicious artifacts, executables, scripts, macros, obfuscated documents, etc., with metadata (e.g., VirusTotal reports), file carriers, and sample hashes. It’s intended for malware analysts/researchers to help study how malware works, how they are delivered, and how it evolves.

Downloads: 10 This Week

Last Update: 3 days ago
See Project
23

borb

borb is a library for reading, creating and manipulating PDF files

borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.

Downloads: 4 This Week

Last Update: 2026-01-11
See Project
24

ArangoDB-Community/pyArango

Python Driver for ArangoDB with built-in validation

PyArango is a Python driver for ArangoDB, a multi-model NoSQL database. It provides a Pythonic way to interact with ArangoDB, allowing developers to manage collections, execute AQL queries, and integrate ArangoDB's document, graph, and key-value storage models into Python applications.

Downloads: 2 This Week

Last Update: 2025-02-22
See Project
25

Umi-OCR

OCR software, free and offline

...It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 8 This Week

Last Update: 3 days ago
See Project

Previous
1
You're on page 2
3
4
5
6
Next

Related Searches

ocr

malware

deepseek

windows boot repair

ocr from pdf

semantic search

python ide

rss

pdfmathtranslate

pdf parser

Related Categories

Artificial Intelligence

Software Development

Business

Internet

Formats and Protocols

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: