Page 2 | unstructured data free download

Search-Index

A persistent, network resilient, full text search library

Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.

Downloads: 12 This Week

Last Update: 2025-03-12

See Project

Diffgram

Training data (data labeling, annotation, workflow) for all data types

...Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.

Downloads: 9 This Week

Last Update: 2024-10-14

See Project

Unstract

No-code LLM Platform to launch APIs and ETL Pipelines

Unstract is a powerful open-source, no-code platform built to automate the extraction and structuring of unstructured documents using large language models and flexible workflows, enabling developers and data teams to turn messy files into organized JSON content without complex coding. It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to refine prompts and extraction logic before deploying at scale. ...

Downloads: 0 This Week

Last Update: 3 days ago

See Project

AceBase realtime database

A fast, low memory, transactional, index, query enabled NoSQL database

A fast, low memory, transactional, index & query enabled NoSQL database engine and server for node.js and browser with real-time data change notifications. Supports storing of JSON objects, arrays, numbers, strings, booleans, dates, begins, and binary (ArrayBuffer) data. Inspired by (and largely compatible with) the Firebase real-time database, with additional functionality and less data sharding/duplication. Capable of storing up to 2^48 (281 trillion) object nodes in a binary database file...

Downloads: 8 This Week

Last Update: 1 day ago

See Project

Airweave

Airweave lets agents search any app

Airweave is an open-source platform that enables agents to semantically search across various applications, databases, and APIs. By transforming disparate data sources into a unified, searchable knowledge base, Airweave facilitates intelligent information retrieval through REST APIs or the MCP protocol. It's particularly useful for building AI agents that require access to structured and unstructured data across multiple platforms.

Downloads: 5 This Week

Last Update: 4 days ago

See Project

Pachyderm

Data-Centric Pipelines and Data Versioning

...Automatic and intelligent versioning of even the largest data sets of unstructured and structured data. Git-like structure enables effective team collaboration. Full versioning for metadata including all analysis, parameters, artifacts, models, and intermediate results. Automatically produces an immutable record for all activities and assets. Pachyderm is used across a variety of industries and use cases.

Downloads: 1 This Week

Last Update: 2025-01-15

See Project

Pimcore

Open Source Data & Experience Management Platform

No matter if you're dealing with unstructured web documents or structured data for MDM/PIM, you define the UI design (web documents by a template and structured data with an intuitive graphical editor), Pimcore knows how to persist the data efficiently and optimized for fast access. Due to the framework approach, Pimcore is very flexible and adapts perfectly to your needs.

Downloads: 1 This Week

Last Update: 6 days ago

See Project

cognee

Deterministic LLMs Outputs for AI Applications and AI Agents

...Any kind of data works; unstructured text or raw media files, PDFs, tables, presentations, JSON files, and so many more. Add small or large files, or many files at once. We map out a knowledge graph from all the facts and relationships we extract from your data. Then, we establish graph topology and connect related knowledge clusters, enabling the LLM to "understand" the data.

Downloads: 9 This Week

Last Update: 3 days ago

See Project

XTDB

General-purpose bitemporal database for SQL, Datalog & graph queries

...Both structured and unstructured data are at home in XTDB. Legal regulations like GDPR often pose a challenge when designing systems around immutable data.

Downloads: 2 This Week

Last Update: 2025-12-01

See Project

Gridap.jl

Grid-based approximation of partial differential equations in Julia

Gridap provides a set of tools for the grid-based approximation of partial differential equations (PDEs) written in the Julia programming language. The library currently supports linear and nonlinear PDE systems for scalar and vector fields, single and multi-field problems, conforming and nonconforming finite element (FE) discretizations, on structured and unstructured meshes of simplices and n-cubes. It also provides methods for time integration. Gridap is extensible and modular. One can...

Downloads: 7 This Week

Last Update: 5 days ago

See Project

MyScaleDB

A @ClickHouse fork that supports high-performance vector search

...The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. MyScaleDB enables developers to perform vector similarity searches using standard SQL syntax, eliminating the need to learn specialized vector database query languages. The database is optimized for high performance and scalability, allowing it to handle extremely large datasets and high query loads typical of production AI applications.

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

Quivr

Your Second Brain supercharged by Generative AI

Quivr, your second brain, utilizes the power of GenerativeAI to store and retrieve unstructured information. Think of it as Obsidian, but turbocharged with AI capabilities.

Downloads: 0 This Week

Last Update: 2025-02-04

See Project

skycaiji

Open source web scraping system for automated data collection tasks

SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets.

Downloads: 2 This Week

Last Update: 20 hours ago

See Project

LinearSolve.jl

High-Performance Unified Interface for Linear Solvers in Julia

LinearSolve.jl is a unified interface for the linear solving packages of Julia. It interfaces with other packages of the Julia ecosystem to make it easy to test alternative solver packages and pass small types to control algorithm swapping. It also interfaces with the ModelingToolkit.jl world of symbolic modeling to allow for automatically generating high-performance code. Performance is key: the current methods are made to be highly performant on scalar and statically sized small problems,...

Downloads: 3 This Week

Last Update: 18 hours ago

See Project

Sparrow

Structured data extraction and instruction calling with ML, LLM

Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to identify and extract meaningful data fields from heterogeneous document layouts. ...

Downloads: 6 This Week

Last Update: 2026-03-04

See Project

OpenWPM

A web privacy measurement framework

OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. Although we...

Downloads: 8 This Week

Last Update: 2026-03-28

See Project

GeoStats.jl

An extensible framework for geospatial data science

...Users can represent georeferenced tables (points + attributes), define domains (grids, meshes, structured/unstructured), and then apply geostatistical operations such as kriging, interpolation, simulation, variogram estimation, and learning-based prediction. Visualization is supported via integration with Makie.jl to produce spatial renderings, mesh visualizations, and variable overlays.

Downloads: 9 This Week

Last Update: 5 days ago

See Project

MindsDB

Making Enterprise Data Intelligent and Responsive for AI

MindsDB is an AI data solution that enables humans, AI, agents, and applications to query data in natural language and SQL, and get highly accurate answers across disparate data sources and types. MindsDB connects to diverse data sources and applications, and unifies petabyte-scale structured and unstructured data. Powered by an industry-first cognitive engine that can operate anywhere (on-prem, VPC, serverless), it empowers both humans and AI with highly informed decision-making capabilities. ...

Downloads: 7 This Week

Last Update: 2026-03-03

See Project

refinery

Open-source choice to scale, assess and maintain natural language data

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact. You are one of the people we've built refinery for. refinery helps you to build better NLP models in a data-centric approach. Semi-automate your labeling, find low-quality subsets in your training data, and monitor your data in one place. refinery doesn't get rid of manual labeling, but it makes sure that your valuable time is spent well. Also, the makers...

Downloads: 1 This Week

Last Update: 2024-06-13

See Project

HASH

The best way to use and work with blocks

This is HASH's public monorepo which contains our public code, docs, and other key resources. HASH is a platform for decision-making, which helps you integrate, understand and use data in a variety of different ways. HASH does this by combining various different powerful tools together into one simple interface. These range from data pipelines and a graph database, through to an all-in-one workspace, no-code tool builder, and agent-based simulation engine. These exist at varying stages of...

Downloads: 1 This Week

Last Update: 2026-03-28

See Project

LangKit

An open-source toolkit for monitoring Language Learning Models (LLMs)

LangKit is an open-source text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library whylogs. Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability space - a challenge worth solving, since the lack of visibility on the model's behavior can have serious consequences.

Downloads: 6 This Week

Last Update: 2024-11-06

See Project

AgentForge

Extensible AGI Framework

AgentForge is a framework for creating and deploying AI agents that can perform autonomous decision-making and task execution. It enables developers to define agent behaviors, train models, and integrate AI-powered automation into various applications.

Downloads: 5 This Week

Last Update: 2025-02-25

See Project

FinGPT

Open-Source Financial Large Language Models

FinGPT is an open-source, finance-specialized large language model framework that blends the capabilities of general LLMs with real-time financial data feeds, domain-specific knowledge bases, and task-oriented agents to support market analysis, research automation, and decision support. It extends traditional GPT-style models by connecting them to live or historical financial datasets, news APIs, and economic indicators so that outputs are grounded in relevant and recent market conditions...

Downloads: 10 This Week

Last Update: 2026-04-03

See Project

AI Powered Knowledge Graph Generator

AI-Powered Knowledge Graph is an open-source project focused on building knowledge graph systems that integrate artificial intelligence and machine learning to represent complex relationships between data entities. Knowledge graphs organize information as networks of nodes and relationships, allowing applications to analyze connections between concepts, datasets, or real-world entities. By incorporating AI techniques such as natural language processing and semantic reasoning, the project...

Downloads: 1 This Week

Last Update: 2026-03-06

See Project

Beads

A memory upgrade for your coding agent

Beads is an open-source project providing a distributed, structured memory system for AI coding agents, replacing ad-hoc text plans with a git-backed graph that represents tasks, dependencies, and progress in a persistent, queryable format. Instead of storing plans as unstructured Markdown or ephemeral notes, Beads organizes agent state, task artifacts, and relationships as nodes and edges in a version-controlled graph so that long-horizon projects don’t lose context or coherence as the agent proceeds. This approach helps coding agents — and human collaborators — track which tasks depend on others, what has been done, and where workflows branch or reunify without losing important data.

Downloads: 40 This Week

Last Update: 2026-04-03

See Project

Search Results for "unstructured data" - Page 2

Showing 89 open source projects for "unstructured data"

Search-Index

Diffgram

Unstract

AceBase realtime database

Airweave

Pachyderm

Pimcore

cognee

XTDB

Gridap.jl

MyScaleDB

Quivr

skycaiji

LinearSolve.jl

Sparrow

OpenWPM

GeoStats.jl

MindsDB

refinery

HASH

LangKit

AgentForge

FinGPT

AI Powered Knowledge Graph Generator

Beads

Search Results for "unstructured data" - Page 2

Showing 89 open source projects for "unstructured data"

Search-Index

Diffgram

Unstract

AceBase realtime database

Airweave

Pachyderm

Pimcore

cognee

XTDB

Gridap.jl

MyScaleDB

Quivr

skycaiji

LinearSolve.jl

Sparrow

OpenWPM

GeoStats.jl

MindsDB

refinery

HASH

LangKit

AgentForge

FinGPT

AI Powered Knowledge Graph Generator

Beads

Related Searches

Related Categories