Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Large Language Models (LLM)
Search Results

Search Results for "extraction"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 25
Mac 25
Windows 25
More...
BSD 19
ChromeOS 19

Category

Artificial Intelligence 25
Business 1

License

OSI-Approved Open Source 25

Programming Language

Python 16
TypeScript 5
JavaScript 2
Rust 1

Showing 25 open source projects for "extraction"

View related business solutions

Large Language Models (LLM) Linux Clear Filters & Widen Search

Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Sparrow

Structured data extraction and instruction calling with ML, LLM

...It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to identify and extract meaningful data fields from heterogeneous document layouts. The architecture is modular, allowing developers to build customizable processing pipelines that integrate with external tools and data extraction frameworks. Sparrow also includes workflow orchestration tools that allow multiple extraction tasks to be combined into automated pipelines for large-scale document processing.

Downloads: 5 This Week

Last Update: 2026-03-04
See Project
2

Extractous

Fast and efficient unstructured data extraction

Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. ...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
3

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

...The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...

Downloads: 5 This Week

Last Update: 2026-03-05
See Project
4

DocStrange

Extract and convert data from any document, images, pdfs, word doc

DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

MiroFish

A Simple and Universal Swarm Intelligence Engine

MiroFish is a next-generation artificial intelligence prediction engine that leverages multi-agent technology and swarm-intelligence simulation to model, simulate, and forecast complex real-world scenarios. The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions...

Downloads: 1,395 This Week

Last Update: 2026-03-05
See Project
6

Bespoke Curator

Synthetic data curation for post-training and data extraction

...It supports workflows where models are used to produce synthetic examples that can later be refined into reliable training datasets for reasoning, question answering, or structured information extraction tasks. Curator includes tools for monitoring data generation processes and managing dataset quality while large batches of examples are being created. The framework also integrates with multiple inference systems and APIs, allowing users to generate data using different model providers or open-source inference engines.

Downloads: 6 This Week

Last Update: 2026-03-14
See Project
7

LangChain Extract

Did you say you like data?

LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents. Built using FastAPI and the LangChain framework, the application exposes a REST API that can process documents and return structured outputs that match user-defined JSON schemas. Developers can create reusable “extractors” that define what type of information should be pulled from a document, along with example prompts that improve extraction quality through in-context learning.

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
8

Wiseflow

Enhance any agent's browser use skill

Wiseflow is an open-source information extraction and knowledge discovery system designed to collect, filter, and organize valuable information from large volumes of online content. The platform continuously monitors specified sources such as websites, social platforms, and other digital channels to identify relevant data according to user-defined interests or topics. By combining web crawling, content parsing, and large language model analysis, the system extracts concise insights from raw information streams and converts them into structured data that can be stored or analyzed. ...

Downloads: 3 This Week

Last Update: 2026-04-05
See Project
9

NLP-Knowledge-Graph

Research and application of technologies such as nl processing

...The project aims to help researchers and developers understand how structured knowledge representations can enhance language processing systems. It includes curated materials covering key topics such as knowledge graph construction, entity recognition, relation extraction, graph embeddings, and semantic reasoning. By combining NLP techniques with graph-based data models, knowledge graphs allow systems to represent complex relationships between entities and improve tasks such as question answering, information retrieval, and recommendation systems. The repository aggregates research papers, technical articles, tutorials, and open-source tools related to these areas.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs)

...VLMEvalKit supports generation-based evaluation methods, allowing models to produce textual responses to visual inputs while measuring performance through techniques such as exact matching or language-model-assisted answer extraction.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
11

kg-gen

Knowledge Graph Generation from Any Text

...The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. ...

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
12

Paper2Slides

From Paper to Presentation in One Click

...It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file type. It uses an extraction approach intended to capture critical insights comprehensively, including important visuals and data points that often get missed in naive summarization. A major focus is traceability: generated slide content is designed to remain linked back to the source material so you can verify accuracy and reduce information drift. It also offers styling flexibility, letting you use built-in themes or describe a custom design direction in natural language for themed outputs.

Downloads: 1 This Week

Last Update: 2026-03-15
See Project
13

Read Frog

Open Source Immersive Translate

Read Frog is an open-source browser extension designed to transform everyday web reading into an immersive language learning experience powered by artificial intelligence. The tool integrates translation, contextual explanations, and content analysis directly into the browsing workflow so users can learn languages naturally while reading authentic online content. Instead of forcing learners to switch between translation tools and the original text, the extension displays translations...

Downloads: 16 This Week

Last Update: 2 days ago
See Project
14

MarkPDFDown

A high-quality PDF to Markdown tool based on large language model

MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists. By producing Markdown rather than raw text,...

Downloads: 10 This Week

Last Update: 2026-03-06
See Project
15

MegaParse

File Parser optimised for LLM Ingestion with no loss

...It efficiently parses various document formats, such as PDFs, DOCX, and PPTX, converting them into formats ideal for processing by LLMs. This tool is essential for applications that require accurate and comprehensive data extraction from diverse document types.

Downloads: 1 This Week

Last Update: 2025-02-14
See Project
16

spacy-llm

Integrating LLMs into structured NLP pipelines

...With only a few (and sometimes no) examples, an LLM can be prompted to perform custom NLP tasks such as text categorization, named entity recognition, coreference resolution, information extraction and more. This package integrates Large Language Models (LLMs) into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks, no training data required.

Downloads: 3 This Week

Last Update: 2026-03-24
See Project
17

FinGPT

Open-Source Financial Large Language Models

...The platform typically includes tools for fine-tuning, context engineering, and prompt templating, enabling users to build specialized assistants for tasks like sentiment analysis, earnings summary generation, risk profiling, trading signal interpretation, and document extraction from financial reports.

Downloads: 7 This Week

Last Update: 2026-04-03
See Project
18

Kor

LLM

This is a half-baked prototype that “helps” you extract structured data from text using LLMs. Specify the schema of what should be extracted and provide some examples. Kor will generate a prompt, send it to the specified LLM and parse out the output. You might even get results back.

Downloads: 0 This Week

Last Update: 2024-07-20
See Project
19

HyperAgent

AI Browser Automation

...This approach reduces the brittleness commonly associated with traditional automation scripts that break when the DOM structure changes. HyperAgent includes APIs such as page.ai() and page.extract() that allow structured data extraction and dynamic task execution through AI reasoning.

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
20

AI Powered Knowledge Graph Generator

AI Powered Knowledge Graph Generator

AI-Powered Knowledge Graph is an open-source project focused on building knowledge graph systems that integrate artificial intelligence and machine learning to represent complex relationships between data entities. Knowledge graphs organize information as networks of nodes and relationships, allowing applications to analyze connections between concepts, datasets, or real-world entities. By incorporating AI techniques such as natural language processing and semantic reasoning, the project...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
21

BrowserNode

Make websites accessible for AI agents. Automate tasks online

Browsernode is an open-source TypeScript framework that allows AI agents to interact directly with web browsers in order to automate tasks and gather information from websites. The project acts as a bridge between AI models and browser automation tools, enabling language models to control web pages programmatically. Built as an implementation compatible with the Browser-use ecosystem, Browsernode allows agents to perform actions such as navigating pages, extracting information, filling...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
22

browserable

Open source and self-hostable browser automation library for AI agents

Browserable is an open-source browser automation framework designed specifically for AI agents that need to interact with web interfaces in a human-like way. The project provides tools that allow automated agents to navigate websites, click buttons, fill out forms, and extract information from pages without manual scripting of each step. Built primarily in JavaScript, the framework offers both a developer-friendly SDK and a REST API that allow integration with AI applications and automation...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
23

AudioMuse-AI

AudioMuse-AI is an Open Source Dockerized environment

AudioMuse-AI is an open-source system designed to automatically generate playlists and analyze music libraries using artificial intelligence and audio signal processing techniques. The platform runs locally in a Dockerized environment and performs detailed sonic analysis on audio files to understand characteristics such as tempo, mood, and acoustic similarity. By analyzing the underlying audio content rather than relying on external metadata services, the system can organize large personal...

Downloads: 1 This Week

Last Update: 2026-04-06
See Project
24

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...

Downloads: 0 This Week

Last Update: 2026-02-23
See Project
25

Reader LLM

Convert any URL to an LLM-friendly input with a simple prefix

Reader LLM is an open-source tool designed to convert web content into formats that are easier for large language models to process. The system works by transforming a webpage into a clean text or Markdown representation that removes unnecessary formatting and highlights the core information within the page. Developers can use a simple URL prefix to retrieve a version of a webpage that has been optimized for machine consumption, making it suitable for use in AI agents or retrieval-augmented...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project

Previous
You're on page 1
Next

Related Searches

face recognition attendance system

finance

face recognition

trading

fingpt

cash management

algorithmic trading software

ai financial data analysis market trading

ai agentic model for automation of email, call

Related Categories

Artificial Intelligence

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise