Search Results for "unstructured data" - Page 2

Showing 89 open source projects for "unstructured data"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Search-Index

    Search-Index

    A persistent, network resilient, full text search library

    Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ...Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    Unstract

    Unstract

    No-code LLM Platform to launch APIs and ETL Pipelines

    Unstract is a powerful open-source, no-code platform built to automate the extraction and structuring of unstructured documents using large language models and flexible workflows, enabling developers and data teams to turn messy files into organized JSON content without complex coding. It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to refine prompts and extraction logic before deploying at scale. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    AceBase realtime database

    AceBase realtime database

    A fast, low memory, transactional, index, query enabled NoSQL database

    A fast, low memory, transactional, index & query enabled NoSQL database engine and server for node.js and browser with real-time data change notifications. Supports storing of JSON objects, arrays, numbers, strings, booleans, dates, begins, and binary (ArrayBuffer) data. Inspired by (and largely compatible with) the Firebase real-time database, with additional functionality and less data sharding/duplication. Capable of storing up to 2^48 (281 trillion) object nodes in a binary database file...
    Downloads: 8 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Airweave

    Airweave

    Airweave lets agents search any app

    Airweave is an open-source platform that enables agents to semantically search across various applications, databases, and APIs. By transforming disparate data sources into a unified, searchable knowledge base, Airweave facilitates intelligent information retrieval through REST APIs or the MCP protocol. It's particularly useful for building AI agents that require access to structured and unstructured data across multiple platforms.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    ...Automatic and intelligent versioning of even the largest data sets of unstructured and structured data. Git-like structure enables effective team collaboration. Full versioning for metadata including all analysis, parameters, artifacts, models, and intermediate results. Automatically produces an immutable record for all activities and assets. Pachyderm is used across a variety of industries and use cases.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Pimcore

    Pimcore

    Open Source Data & Experience Management Platform

    No matter if you're dealing with unstructured web documents or structured data for MDM/PIM, you define the UI design (web documents by a template and structured data with an intuitive graphical editor), Pimcore knows how to persist the data efficiently and optimized for fast access. Due to the framework approach, Pimcore is very flexible and adapts perfectly to your needs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    cognee

    cognee

    Deterministic LLMs Outputs for AI Applications and AI Agents

    ...Any kind of data works; unstructured text or raw media files, PDFs, tables, presentations, JSON files, and so many more. Add small or large files, or many files at once. We map out a knowledge graph from all the facts and relationships we extract from your data. Then, we establish graph topology and connect related knowledge clusters, enabling the LLM to "understand" the data.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    XTDB

    XTDB

    General-purpose bitemporal database for SQL, Datalog & graph queries

    ...Both structured and unstructured data are at home in XTDB. Legal regulations like GDPR often pose a challenge when designing systems around immutable data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Gridap.jl

    Gridap.jl

    Grid-based approximation of partial differential equations in Julia

    Gridap provides a set of tools for the grid-based approximation of partial differential equations (PDEs) written in the Julia programming language. The library currently supports linear and nonlinear PDE systems for scalar and vector fields, single and multi-field problems, conforming and nonconforming finite element (FE) discretizations, on structured and unstructured meshes of simplices and n-cubes. It also provides methods for time integration. Gridap is extensible and modular. One can...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    MyScaleDB

    MyScaleDB

    A @ClickHouse fork that supports high-performance vector search

    ...The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. MyScaleDB enables developers to perform vector similarity searches using standard SQL syntax, eliminating the need to learn specialized vector database query languages. The database is optimized for high performance and scalability, allowing it to handle extremely large datasets and high query loads typical of production AI applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Quivr

    Quivr

    Your Second Brain supercharged by Generative AI

    Quivr, your second brain, utilizes the power of GenerativeAI to store and retrieve unstructured information. Think of it as Obsidian, but turbocharged with AI capabilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    LinearSolve.jl

    LinearSolve.jl

    High-Performance Unified Interface for Linear Solvers in Julia

    LinearSolve.jl is a unified interface for the linear solving packages of Julia. It interfaces with other packages of the Julia ecosystem to make it easy to test alternative solver packages and pass small types to control algorithm swapping. It also interfaces with the ModelingToolkit.jl world of symbolic modeling to allow for automatically generating high-performance code. Performance is key: the current methods are made to be highly performant on scalar and statically sized small problems,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to identify and extract meaningful data fields from heterogeneous document layouts. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. Although we...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    GeoStats.jl

    GeoStats.jl

    An extensible framework for geospatial data science

    ...Users can represent georeferenced tables (points + attributes), define domains (grids, meshes, structured/unstructured), and then apply geostatistical operations such as kriging, interpolation, simulation, variogram estimation, and learning-based prediction. Visualization is supported via integration with Makie.jl to produce spatial renderings, mesh visualizations, and variable overlays.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 18
    MindsDB

    MindsDB

    Making Enterprise Data Intelligent and Responsive for AI

    MindsDB is an AI data solution that enables humans, AI, agents, and applications to query data in natural language and SQL, and get highly accurate answers across disparate data sources and types. MindsDB connects to diverse data sources and applications, and unifies petabyte-scale structured and unstructured data. Powered by an industry-first cognitive engine that can operate anywhere (on-prem, VPC, serverless), it empowers both humans and AI with highly informed decision-making capabilities. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    refinery

    refinery

    Open-source choice to scale, assess and maintain natural language data

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact. You are one of the people we've built refinery for. refinery helps you to build better NLP models in a data-centric approach. Semi-automate your labeling, find low-quality subsets in your training data, and monitor your data in one place. refinery doesn't get rid of manual labeling, but it makes sure that your valuable time is spent well. Also, the makers...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    HASH

    HASH

    The best way to use and work with blocks

    This is HASH's public monorepo which contains our public code, docs, and other key resources. HASH is a platform for decision-making, which helps you integrate, understand and use data in a variety of different ways. HASH does this by combining various different powerful tools together into one simple interface. These range from data pipelines and a graph database, through to an all-in-one workspace, no-code tool builder, and agent-based simulation engine. These exist at varying stages of...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    LangKit

    LangKit

    An open-source toolkit for monitoring Language Learning Models (LLMs)

    LangKit is an open-source text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library whylogs. Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability space - a challenge worth solving, since the lack of visibility on the model's behavior can have serious consequences.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    AgentForge

    AgentForge

    Extensible AGI Framework

    AgentForge is a framework for creating and deploying AI agents that can perform autonomous decision-making and task execution. It enables developers to define agent behaviors, train models, and integrate AI-powered automation into various applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    FinGPT

    FinGPT

    Open-Source Financial Large Language Models

    FinGPT is an open-source, finance-specialized large language model framework that blends the capabilities of general LLMs with real-time financial data feeds, domain-specific knowledge bases, and task-oriented agents to support market analysis, research automation, and decision support. It extends traditional GPT-style models by connecting them to live or historical financial datasets, news APIs, and economic indicators so that outputs are grounded in relevant and recent market conditions...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 24
    AI Powered Knowledge Graph Generator

    AI Powered Knowledge Graph Generator

    AI Powered Knowledge Graph Generator

    AI-Powered Knowledge Graph is an open-source project focused on building knowledge graph systems that integrate artificial intelligence and machine learning to represent complex relationships between data entities. Knowledge graphs organize information as networks of nodes and relationships, allowing applications to analyze connections between concepts, datasets, or real-world entities. By incorporating AI techniques such as natural language processing and semantic reasoning, the project...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Beads

    Beads

    A memory upgrade for your coding agent

    Beads is an open-source project providing a distributed, structured memory system for AI coding agents, replacing ad-hoc text plans with a git-backed graph that represents tasks, dependencies, and progress in a persistent, queryable format. Instead of storing plans as unstructured Markdown or ephemeral notes, Beads organizes agent state, task artifacts, and relationships as nodes and edges in a version-controlled graph so that long-horizon projects don’t lose context or coherence as the agent proceeds. This approach helps coding agents — and human collaborators — track which tasks depend on others, what has been done, and where workflows branch or reunify without losing important data.
    Downloads: 40 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB