Showing 15 open source projects for "unstructured data"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Thulite

    Thulite

    Web framework designed for speed, security, and SEO

    Thulite is an AI-powered search and recommendation engine that enhances search functionality in applications. It provides intelligent query processing, result ranking, and personalized recommendations.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports....
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    ...Automatic and intelligent versioning of even the largest data sets of unstructured and structured data. Git-like structure enables effective team collaboration. Full versioning for metadata including all analysis, parameters, artifacts, models, and intermediate results. Automatically produces an immutable record for all activities and assets. Pachyderm is used across a variety of industries and use cases.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    XTDB

    XTDB

    General-purpose bitemporal database for SQL, Datalog & graph queries

    ...Both structured and unstructured data are at home in XTDB. Legal regulations like GDPR often pose a challenge when designing systems around immutable data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. Although we...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    GeoStats.jl

    GeoStats.jl

    An extensible framework for geospatial data science

    ...Users can represent georeferenced tables (points + attributes), define domains (grids, meshes, structured/unstructured), and then apply geostatistical operations such as kriging, interpolation, simulation, variogram estimation, and learning-based prediction. Visualization is supported via integration with Makie.jl to produce spatial renderings, mesh visualizations, and variable overlays.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Text Search Engine

    Text Search Engine

    A text search engine that supports mixed Chinese and English search

    Text-Search-Engine is a JavaScript-based lightweight search engine that enables full-text search functionality. It allows developers to implement fast search indexing and retrieval in web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    LangExtract

    LangExtract

    A Python library for extracting structured information

    LangExtract is a Python library developed by Google that leverages large language models (LLMs) to extract structured information from unstructured text—such as clinical notes, research papers, or literary works—based on user-defined instructions. It is designed to transform free-form text into reliable, schema-constrained data while maintaining traceability back to the source material. Each extracted entity is precisely grounded in its original context, allowing visual inspection and validation via automatically generated interactive HTML visualizations. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model inference, making your pipeline execution 10x faster. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    Pytorch Points 3D

    Pytorch Points 3D

    Pytorch framework for doing deep learning on point clouds

    Torch Points 3D is a framework for developing and testing common deep learning models to solve tasks related to unstructured 3D spatial data i.e. Point Clouds. The framework currently integrates some of the best-published architectures and it integrates the most common public datasets for ease of reproducibility. It heavily relies on Pytorch Geometric and Facebook Hydra library thanks for the great work! We aim to build a tool that can be used for benchmarking SOTA models, while also allowing practitioners to efficiently pursue research into point cloud analysis, with the end goal of building models which can be applied to real-life applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    Advanced Numerical Instruments 2D

    Advanced numerical instruments: adaptive meshing, FE methods, solvers

    Ani2D provides portable libraries for each step in the numerical solution of systems of PDEs with variable tensorial coefficients: (1) unstructured adaptive mesh generation, (2) metric-based mesh adaptation, (3) finite element discretization and interpolation, (4) algebraic solvers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    WOOKI is a peer-to-peer wiki. It based on unstructured p2P network with data replication. WOOT framework synchronizes data. SWOOKI is a semantic extenion of WOOKI. It is implemented as a plugin of WOOKI. SWOOKI is peer-to-peer semantic wiki.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB