Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "open document" - Page 4

x

Sort By:

Relevance

Clear All Filters

OS

Windows 126
Linux 122
Mac 118
More...
BSD 53
ChromeOS 52
Mobile Operating Systems 2
Server Operating Systems 1

Category

Artificial Intelligence 135
Business 11
Scientific/Engineering 5
Software Development 4
Education 3
Text Editors 3
Multimedia 2
Communications 1
Database 1
Internet 1
System 1

License

OSI-Approved Open Source 129
Creative Commons Attribution License 3

Translations

English 3
German 1
Russian 1

Programming Language

Python 135
JavaScript 7
Unix Shell 4
C++ 3
TypeScript 3
More...
C 2
Java 2
C# 1
Delphi/Kylix 1
Go 1
Prolog 1
S/R 1
Visual Basic 1
XSL (XSLT/XPath/XSL-FO) 1

Status

Beta 5
Production/Stable 3
Alpha 2
Planning 1
More...
Pre-Alpha 1

Showing 135 open source projects for "open document"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
1

AppAgent

Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same...

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
2

Tongyi DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

DeepResearch (Tongyi DeepResearch) is an open-source “deep research agent” developed by Alibaba’s Tongyi Lab designed for long-horizon, information-seeking tasks. It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active.

Downloads: 2 This Week

Last Update: 2026-02-27
See Project
3

Haystack

Haystack is an open source NLP framework to interact with your data

Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning,...

Downloads: 6 This Week

Last Update: 2026-04-20
See Project
4

Paper2Slides

From Paper to Presentation in One Click

Paper2Slides is an automation tool that converts research papers, reports, and other documents into polished slide decks and posters with minimal manual effort. It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file...

Downloads: 2 This Week

Last Update: 2026-03-15
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Cleanlab

The standard data-centric AI package for data quality and ML

cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...

Downloads: 2 This Week

Last Update: 2026-01-13
See Project
6

myGPTReader

AI Slack bot for reading, summarizing, and chatting with content

myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
7

Pathway AI Pipelines

Ready-to-run cloud templates for RAG

Pathway AI Pipelines is a collection of ready-to-deploy AI pipeline templates designed to help developers rapidly build production-grade retrieval-augmented generation and enterprise search applications. The project provides end-to-end examples that connect live data sources to LLM workflows, enabling applications to stay synchronized with continuously changing information. It supports numerous connectors including local files, Google Drive, SharePoint, Kafka, PostgreSQL, and real-time APIs,...

Downloads: 0 This Week

Last Update: 2026-03-02
See Project
8

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...

Downloads: 0 This Week

Last Update: 2026-02-23
See Project
9

Dolphin

Document Image Parsing via Heterogeneous Anchor Prompting”

Dolphin — maintained by ByteDance — is a project aimed at providing a high-performance, robust, and extensible media or multimedia framework / player infrastructure (or possibly a streaming media solution), intended to meet modern demands for efficiency, flexibility, and integration in media-heavy applications. It seeks to combine performant media playback or handling (audio/video decoding, streaming, buffering) with a modular, developer-friendly API that allows easy embedding into larger...

Downloads: 0 This Week

Last Update: 2026-03-25
See Project
Earn up to 16% annual interest with Nexo.
Let your crypto work for you

Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
10

FlexLLMGen

Running large language models on a single GPU

FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
11

SAG

SQL-Driven RAG Engine

SAG is an open-source SQL-driven retrieval-augmented generation engine that dynamically constructs knowledge graphs during query processing. Instead of relying on a static knowledge graph prepared in advance, the system automatically builds relational structures between entities while processing user queries. Documents are first decomposed into atomic semantic events, which are then represented using multidimensional natural language vectors. These vectors allow the system to identify...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
12

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

ModernBERT is an open-source research project that modernizes the classic BERT encoder architecture by incorporating recent advances in transformer design, training techniques, and efficiency improvements. The goal of the project is to bring BERT-style models up to date with the capabilities of modern large language models while preserving the strengths of bidirectional encoder architectures used for tasks such as classification, retrieval, and semantic search. ModernBERT introduces...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
13

marqo

Tensor search for humans

A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...

Downloads: 1 This Week

Last Update: 2026-04-02
See Project
14

LLM TLDR

95% token savings. 155x faster queries. 16 languages

LLM TLDR is a tool that leverages large language models (LLMs) to generate concise, coherent summaries (TL;DRs) of long documents, articles, or text files, helping users quickly understand large amounts of content without reading every word. It integrates with LLM APIs to handle input texts of varying lengths and complexity, applying techniques like chunking, context management, and multi-pass summarization to preserve accuracy even when the source is very large. The system supports both...

Downloads: 0 This Week

Last Update: 2026-01-27
See Project
15

ChatGPT Academic

ChatGPT extension for scientific research work

ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are...

Downloads: 0 This Week

Last Update: 2024-12-19
See Project
16

GLM-4.1V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.1V — often referred to as a smaller / lighter version of the GLM-V family — offers a more resource-efficient option for users who want multimodal capabilities without requiring large compute resources. Though smaller in scale, GLM-4.1V maintains competitive performance, particularly impressive on many benchmarks for models of its size: in fact, on a number of multimodal reasoning and vision-language tasks it outperforms some much larger models from other families. It represents a...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
17

Mini Agent

A minimal yet professional single agent demo project

Mini-Agent is a minimal yet production-minded demo project that shows how to build a serious command-line AI agent around the MiniMax-M2 model. It is designed both as a reference implementation and as a usable agent, demonstrating a full execution loop that includes planning, tool calls, and iterative refinement. The project exposes an Anthropic-compatible API interface and fully supports interleaved thinking, letting the agent alternate between reasoning steps and tool invocations during...

Downloads: 0 This Week

Last Update: 2026-02-14
See Project
18

MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training

MedicalGPT training medical GPT model with ChatGPT training pipeline, implementation of Pretraining, Supervised Finetuning, Reward Modeling and Reinforcement Learning. MedicalGPT trains large medical models, including secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training.

Downloads: 0 This Week

Last Update: 2026-04-20
See Project
19

LangChain Extract

Did you say you like data?

LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
20

ChatGPT Retrieval Plugin

The ChatGPT Retrieval Plugin lets you easily find personal documents

The chatgpt-retrieval-plugin repository implements a semantic retrieval backend that lets ChatGPT (or GPT-powered tools) access private or organizational documents in natural language by combining vector search, embedding models, and plugin infrastructure. It can serve as a custom GPT plugin or function-calling backend so that a chat session can “look up” relevant documents based on user queries, inject those results into context, and respond more knowledgeably about a private knowledge...

Downloads: 0 This Week

Last Update: 2025-10-02
See Project
21

Canopy

Retrieval Augmented Generation (RAG) framework

Canopy is an open-source retrieval-augmented generation (RAG) framework developed by Pinecone to simplify the process of building applications that combine large language models with external knowledge sources. The system provides a complete pipeline for transforming raw text data into searchable embeddings, storing them in a vector database, and retrieving relevant context for language model responses.

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
22

bitfarm-Archiv Document Management - DMS

bitfarm-Archiv is a powerful Document Management (DMS), Enterprise Content Management (ECM) and Knowledge Management System (KMS) with Workflow Components. Help us! As we live in the internet age, the best thing, you can help, is to write a short statement about your scenario and your use of the DMS, along with your experiences and put it on your own website or in a blog or forum. It would help us best, if you can also add a hyperlink to our site http://www.bitfarm-archiv.com. By this...

11 Reviews

Downloads: 4 This Week

Last Update: 2026-04-15
See Project
23

LangChain-ChatGLM-Webui

Automatic question answering for local knowledge bases based on LLM

LangChain-ChatGLM-Webui is an open-source web interface that integrates the ChatGLM large language model with the LangChain framework to create an interactive conversational AI platform. The project provides a graphical interface that allows users to interact with language models through chat sessions while also connecting those models to external knowledge sources.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
24

RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding

RAG-Retrieval is an open-source framework for building and training retrieval systems used in retrieval-augmented generation pipelines. Retrieval-augmented generation combines large language models with external knowledge retrieval to improve factual accuracy and domain-specific reasoning. This repository provides end-to-end infrastructure for training retrieval models, performing inference, and distilling embedding models for improved performance. It includes implementations of modern...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
25

PyTextRank

Python implementation of TextRank algorithms

PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work -- and related knowledge graph practices.

Downloads: 0 This Week

Last Update: 2024-08-09
See Project

Previous
1
2
3
You're on page 4
5
6
Next

Related Searches

word

chatgpt

dms

zotero

research

c# pdf

offline document management

logicaldoc document management - dms

Related Categories

Artificial Intelligence

Business

Scientific/Engineering

Software Development

Education

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise