Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Large Language Models (LLM)
Search Results

Search Results for "documents"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 15
Windows 15
Mac 14
More...
BSD 4
ChromeOS 4

Category

Artificial Intelligence 16
Software Development 1

License

OSI-Approved Open Source 15

Programming Language

Python 16

Showing 16 open source projects for "documents"

View related business solutions

Large Language Models (LLM) Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.

This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.

Learn More
1

Khoj

An AI personal assistant for your digital brain

Get more done with your open-source AI personal assistant. Khoj is a desktop application to search and chat with your notes, documents, and images. It is an offline-first, open-source AI personal assistant that is accessible from Emacs, Obsidian or your Web browser. Khoj is a thinking tool that is transparent, fun, and easy to engage with. You can build faster and better by using Khoj to search and reason across all your data sources. Khoj learns from your notes and documents to function as an extension of your brain. ...

Downloads: 3 This Week

Last Update: 2026-01-02
See Project
2

PrivateGPT

Interact with your documents using the power of GPT

PrivateGPT is a production-ready, privacy-first AI system that allows querying of uploaded documents using LLMs, operating completely offline in your own environment. It provides contextual generative AI capabilities without sending data externally. Now maintained under Zylon.ai with enterprise deployment options (air gapped, cloud, or on-prem).

Downloads: 9 This Week

Last Update: 2025-07-29
See Project
3

Unstructured.IO

Open source libraries and APIs to build custom preprocessing pipelines

The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into structured outputs.

Downloads: 4 This Week

Last Update: 2026-01-09
See Project
4

MegaParse

File Parser optimised for LLM Ingestion with no loss

MegaParse is a file parser optimized for Large Language Model (LLM) ingestion, ensuring no loss of information. It efficiently parses various document formats, such as PDFs, DOCX, and PPTX, converting them into formats ideal for processing by LLMs. This tool is essential for applications that require accurate and comprehensive data extraction from diverse document types.

Downloads: 0 This Week

Last Update: 2025-02-14
See Project
Process Street | Compliance Operations Platform
Systemize execution. Prove compliance.

Bring compliance and operations under one roof with an AI agent that automates workflows, policies that enforce rules, and a platform that delivers results.

Learn More
5

Qwen3

Qwen3 is the large language model series developed by Qwen team

Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K tokens. It delivers higher quality and more helpful text generation across multiple languages and domains, including mathematics, coding, science, and tool usage. Various quantized versions,...

1 Review

Downloads: 70 This Week

Last Update: 2026-01-09
See Project
6

Tongyi DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

DeepResearch (Tongyi DeepResearch) is an open-source “deep research agent” developed by Alibaba’s Tongyi Lab designed for long-horizon, information-seeking tasks. It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and reinforcement learning; supports benchmarks like web search, document understanding, question answering, “agentic” tasks; provides inference tools, evaluation scripts, and “web agent” style interfaces. ...

Downloads: 3 This Week

Last Update: 2026-01-12
See Project
7

Controllable-RAG-Agent

This repository provides an advanced RAG

...A key focus is hallucination control: each answer is verified against retrieved context, and responses are reworked when they are not sufficiently grounded in the source documents.

Downloads: 0 This Week

Last Update: 2025-11-13
See Project
8

Qwen-2.5-VL

Qwen2.5-VL is the multimodal large language model series

Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...

Downloads: 15 This Week

Last Update: 2026-01-04
See Project
9

GLM-V

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning

GLM-V is an open-source vision-language model (VLM) series from ZhipuAI that extends the GLM foundation models into multimodal reasoning and perception. The repository provides both GLM-4.5V and GLM-4.1V models, designed to advance beyond basic perception toward higher-level reasoning, long-context understanding, and agent-based applications. GLM-4.5V builds on the flagship GLM-4.5-Air foundation (106B parameters, 12B active), achieving state-of-the-art results on 42 benchmarks across image,...

Downloads: 3 This Week

Last Update: 5 days ago
See Project
Turn PDFs into postal mail
Postal Mail Solutions for Digital Workplaces.

Click2Mail transforms conventional mail with its online and on-demand, SaaS print-to-mail service.

Learn More
10

CodeLlama

Inference code for CodeLlama models

Code Llama is a family of Llama-based code models optimized for programming tasks such as code generation, completion, and repair, with variants specialized for base coding, Python, and instruction following. The repo documents the sizes and capabilities (e.g., 7B, 13B, 34B) and highlights features like infilling and large input context to support real IDE workflows. It targets both general software synthesis and language-specific productivity, offering strong performance among open models at release time. Typical usage includes prompt-driven generation, function or class completion, and zero-shot adherence to natural-language instructions about code changes. ...

Downloads: 2 This Week

Last Update: 2025-10-08
See Project
11

MetaGPT

The Multi-Agent Framework

...Assign different roles to GPTs to form a collaborative software entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories / competitive analysis/requirements/data structures / APIs / documents, etc. Internally, MetaGPT includes product managers/architects/project managers/engineers. It provides the entire process of a software company along with carefully orchestrated SOPs.

Downloads: 2 This Week

Last Update: 2025-03-02
See Project
12

LLaMA 3

The official Meta Llama 3 GitHub site

...As the Llama stack evolved, Meta consolidated repositories and marked this one deprecated, pointing users to newer, centralized hubs for models, utilities, and docs. Even as a deprecated repo, it documents the transition path and preserves references that clarify how Llama 3 releases map into the current ecosystem. Practically, it functioned as a bridge between Llama 2 and later Llama releases by standardizing distribution and starter code for inference and fine-tuning. Teams still treat it as historical reference material for version lineage and migration notes.

Downloads: 1 This Week

Last Update: 2025-10-08
See Project
13

BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics

BERTopic is a topic modeling technique that leverages transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions. BERTopic supports guided, supervised, semi-supervised, manual, long-document, hierarchical, class-based, dynamic, and online topic modeling. It even supports visualizations similar to LDAvis! Corresponding medium posts can be found here, here and here. For a more detailed overview, you can...

Downloads: 1 This Week

Last Update: 2025-12-03
See Project
14

marqo

Tensor search for humans

...Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and text-to-image search and analytics. Marqo adapts and stores your data in a fully schemaless manner. It combines tensor search with a query DSL that provides efficient pre-filtering. ...

Downloads: 1 This Week

Last Update: 2026-01-06
See Project
15

MiniMax-01

Large-language-model & vision-language-model based on Linear Attention

MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
16

LangChain Apps on Production with Jina

Langchain Apps on Production with Jina & FastAPI

Jina is an open-source framework for building scalable multi-modal AI apps on Production. LangChain is another open-source framework for building applications powered by LLMs. long-chain-serve helps you deploy your LangChain apps on Jina AI Cloud in a matter of seconds. You can benefit from the scalability and serverless architecture of the cloud without sacrificing the ease and convenience of local development. And if you prefer, you can also deploy your LangChain apps on your own...

Downloads: 0 This Week

Last Update: 2023-08-25
See Project

Previous
You're on page 1
Next

Related Searches

obsidian

local ai

unstructured data

ai

pdf ai

pc autonomous ai

offline artificial intelligence\

offline artificial intelligence assistant

ai chat

offline ai iamge

Related Categories

Artificial Intelligence

Software Development

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: