unstructured data free download

Showing 19 open source projects for "unstructured data"

View related business solutions

Business Linux Clear Filters & Widen Search

AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
1

Milvus

Vector database for scalable similarity search and AI applications

Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility. Average latency measured in milliseconds on trillion vector datasets. ...

Downloads: 10 This Week

Last Update: 2026-04-07
See Project
2

Scanopy

Clean network diagrams, One-time setup, zero upkeep

Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. ...

Downloads: 23 This Week

Last Update: 2026-04-02
See Project
3

Gretel Synthetics

Synthetic data generators for structured and unstructured text

Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports....

Downloads: 9 This Week

Last Update: 2025-03-17
See Project
4

DataChain

AI-data warehouse to enrich, transform and analyze unstructured data

Datachain enables multimodal API calls and local AI inferences to run in parallel over many samples as chained operations. The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain...

Downloads: 5 This Week

Last Update: 1 day ago
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

Diffgram

Training data (data labeling, annotation, workflow) for all data types

...Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.

Downloads: 9 This Week

Last Update: 2024-10-14
See Project
6

Pimcore

Open Source Data & Experience Management Platform

No matter if you're dealing with unstructured web documents or structured data for MDM/PIM, you define the UI design (web documents by a template and structured data with an intuitive graphical editor), Pimcore knows how to persist the data efficiently and optimized for fast access. Due to the framework approach, Pimcore is very flexible and adapts perfectly to your needs.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
7

Gridap.jl

Grid-based approximation of partial differential equations in Julia

Gridap provides a set of tools for the grid-based approximation of partial differential equations (PDEs) written in the Julia programming language. The library currently supports linear and nonlinear PDE systems for scalar and vector fields, single and multi-field problems, conforming and nonconforming finite element (FE) discretizations, on structured and unstructured meshes of simplices and n-cubes. It also provides methods for time integration. Gridap is extensible and modular. One can...

Downloads: 7 This Week

Last Update: 5 days ago
See Project
8

LinearSolve.jl

High-Performance Unified Interface for Linear Solvers in Julia

LinearSolve.jl is a unified interface for the linear solving packages of Julia. It interfaces with other packages of the Julia ecosystem to make it easy to test alternative solver packages and pass small types to control algorithm swapping. It also interfaces with the ModelingToolkit.jl world of symbolic modeling to allow for automatically generating high-performance code. Performance is key: the current methods are made to be highly performant on scalar and statically sized small problems,...

Downloads: 3 This Week

Last Update: 23 hours ago
See Project
9

FinGPT

Open-Source Financial Large Language Models

FinGPT is an open-source, finance-specialized large language model framework that blends the capabilities of general LLMs with real-time financial data feeds, domain-specific knowledge bases, and task-oriented agents to support market analysis, research automation, and decision support. It extends traditional GPT-style models by connecting them to live or historical financial datasets, news APIs, and economic indicators so that outputs are grounded in relevant and recent market conditions...

Downloads: 10 This Week

Last Update: 2026-04-03
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

Relaticle

The Next-Generation Open-Source CRM Platform written with Laravel

...The interface lets you write plain text notes and tag or connect them dynamically, making it easier to uncover patterns and connections over time instead of losing insights in a long, unstructured list. Because it’s built with productivity and exploration in mind, Relaticle offers fast search, semantic context awareness, and the ability to zoom from high-level overviews down to specific node details. It also supports self-hosting so users retain full control over their data without relying on third-party servers or cloud subscriptions.

Downloads: 4 This Week

Last Update: 3 days ago
See Project
11

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 8 This Week

Last Update: 2026-03-27
See Project
12

BDS

Blockchain data parsing and persisting results

JD Cloud Blockchain Data Service (BDS) is a real-time data aggregating, analyzing, and visualization service for chain-like unstructured data from all kinds of 3rd party Blockchains. Splitter is the key module of Blockchain Data Service (BDS) and provides data analysis capability. Splitter is responsible for consuming blockchain data from message queue (kafka) and inserting data into persistent data storage services (relational database, data warehouse, etc.) for further processing. ...

Downloads: 1 This Week

Last Update: 2022-04-18
See Project
13

TEXT2DATA

Text Analytics Platform

Bring Text Analytics Platform that uses NLP (Natural Language Processing) and Machine Learning to your work environment. Extract essential information from your text documents and let Artificial Intelligence save your time. Get detailed and agile reports on your unstructured data.

Downloads: 0 This Week

Last Update: 2019-07-17
See Project
14

iCubing

Several OLAP algorithms, data structures and HPC OLAP versions

OLAP technology is very useful for decision makers and data mining tools with BIG data. In this direction, we implement iCubing project with several multidimensional data cube approaches for cube indexing, querying, updating and mining. There are also several cube types, i.e. alphanumeric cubes, text cubes with unstructured data and geo cube with geo types, dimensions, measures and hierarchies, so the OLAP area continues a hard challenge after more than 20 years of the seminal paper of Jim Gray et al. in 1997. ...

Downloads: 0 This Week

Last Update: 2016-08-25
See Project
15

CloverETL

Moved to sf.net/projects/cloveretl/ !!! CloverETL is a Java ETL framework which transforms structured or unstructured data. Works as a standalone application or embedded in other applications as a data transformation library of functions.

Downloads: 24 This Week

Last Update: 2014-06-09
See Project
16

Simplexo Enterprise Search

Single Click Real Time searching of both structured and unstructured data and information. Simultaneous searching of Structured: databases and unstructured: documents from within a web browser, desktop application and application plugins

Downloads: 0 This Week

Last Update: 2015-06-23
See Project
17

Hardware Assisted Visibility Sorting

The Hardware Assisted Visibility Sorting (HAVS) algorithm is a GPU-based, direct volume renderer for unstructured grids. The algorithm operates in both object- and images-space and includes a sample-based, dynamic level-of-detail algorithm.

Downloads: 0 This Week

Last Update: 2013-03-08
See Project
18

bitHull

bitHull is a Simple unstructured data store-and-share mechanism. It is part experimental graph-based task/note/idea management system and part data aggregator.

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
19

Enterprise Knowledge Base

The Enterprise Knowledge Base (EKB) from ModelDriven.org is a repository for enterprise knowledge: metadata, planning and governance. The EKB can manage and transform both structured and unstructured data as files, Eclipse-EMF or RDF ontologies.

Downloads: 0 This Week

Last Update: 2014-07-14
See Project