Showing 46 open source projects for "unstructured data"

View related business solutions
  • Automated RMM Tools | RMM Software Icon
    Automated RMM Tools | RMM Software

    Proactively monitor, manage, and support client networks with ConnectWise Automate

    Out-of-the-box scripts. Around-the-clock monitoring. Unmatched automation capabilities. Start doing more with less and exceed service delivery expectations.
  • Control remote support software for remote workers and IT teams Icon
    Control remote support software for remote workers and IT teams

    Raise the bar for remote support and reduce customer downtime.

    ConnectWise ScreenConnect, formerly ConnectWise Control, is a remote support solution for Managed Service Providers (MSP), Value Added Resellers (VAR), internal IT teams, and managed security providers. Fast, reliable, secure, and simple to use, ConnectWise ScreenConnect helps businesses solve their customers' issues faster from any location. The platform features remote support, remote access, remote meeting, customization, and integrations with leading business tools.
  • 1
    Unstructured.IO

    Unstructured.IO

    Open source libraries and APIs to build custom preprocessing pipelines

    The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    MeshLab

    MeshLab

    The open source mesh processing system

    MeshLab is an open-source, portable, and extensible system for the processing and editing of unstructured large 3D triangular meshes. It is aimed to help the processing of the typical not-so-small unstructured models arising in 3D scanning, providing a set of tools for editing, cleaning, healing, inspecting, rendering and converting this kind of meshes. MeshLab is mostly based on the open source C++ mesh processing library VCGlib developed at the Visual Computing Lab of ISTI - CNR. VCG can...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 3
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. Although we don't...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Pimcore

    Pimcore

    Open Source Data & Experience Management Platform

    No matter if you're dealing with unstructured web documents or structured data for MDM/PIM, you define the UI design (web documents by a template and structured data with an intuitive graphical editor), Pimcore knows how to persist the data efficiently and optimized for fast access. Due to the framework approach, Pimcore is very flexible and adapts perfectly to your needs. Built on top of the well-known Symfony Framework you have a solid and modern foundation for your project. Benefit from all...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Find out just how much your login box can do for your customer | Auth0 Icon
    Find out just how much your login box can do for your customer | Auth0

    With over 53 social login options, you can fast-track the signup and login experience for users.

    From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.
  • 5
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Quivr

    Quivr

    Your Second Brain supercharged by Generative AI

    Quivr, your second brain, utilizes the power of GenerativeAI to store and retrieve unstructured information. Think of it as Obsidian, but turbocharged with AI capabilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Milvus

    Milvus

    Vector database for scalable similarity search and AI applications

    Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility. Average latency measured in milliseconds on trillion...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    GraphRAG

    GraphRAG

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Free CRM Software With Something for Everyone Icon
    Free CRM Software With Something for Everyone

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    Think CRM software is just about contact management? Think again. HubSpot CRM has free tools for everyone on your team, and it’s 100% free. Here’s how our free CRM solution makes your job easier.
  • 10
    LlamaIndex

    LlamaIndex

    Central interface to connect your LLM's with external data

    LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ... the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    Data-driven pipelines automatically trigger based on detecting data changes. Automatic immutable data lineage and data versioning of all data types. Autoscaling and parallel processing built on Kubernetes for resource orchestration. Uses standard object stores for data storage with automatic deduplication. Runs across all major cloud providers and on-premises installations. Automatic and intelligent versioning of even the largest data sets of unstructured and structured data. Git-like...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    spacy-llm

    spacy-llm

    Integrating LLMs into structured NLP pipelines

    Large Language Models (LLMs) feature powerful natural language understanding capabilities. With only a few (and sometimes no) examples, an LLM can be prompted to perform custom NLP tasks such as text categorization, named entity recognition, coreference resolution, information extraction and more. This package integrates Large Language Models (LLMs) into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    LangKit

    LangKit

    An open-source toolkit for monitoring Language Learning Models (LLMs)

    LangKit is an open-source text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library whylogs. Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    refinery

    refinery

    Open-source choice to scale, assess and maintain natural language data

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact. You are one of the people we've built refinery for. refinery helps you to build better NLP models in a data-centric approach. Semi-automate your labeling, find low-quality subsets in your training data, and monitor your data in one place. refinery doesn't get rid of manual labeling, but it makes sure that your valuable time is spent well. Also, the makers...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    cognee

    cognee

    Deterministic LLMs Outputs for AI Applications and AI Agents

    ... works; unstructured text or raw media files, PDFs, tables, presentations, JSON files, and so many more. Add small or large files, or many files at once. We map out a knowledge graph from all the facts and relationships we extract from your data. Then, we establish graph topology and connect related knowledge clusters, enabling the LLM to "understand" the data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    HASH

    HASH

    The best way to use and work with blocks

    This is HASH's public monorepo which contains our public code, docs, and other key resources. HASH is a platform for decision-making, which helps you integrate, understand and use data in a variety of different ways. HASH does this by combining various different powerful tools together into one simple interface. These range from data pipelines and a graph database, through to an all-in-one workspace, no-code tool builder, and agent-based simulation engine. These exist at varying stages...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    XTDB

    XTDB

    General-purpose bitemporal database for SQL, Datalog & graph queries

    ... the relational model with SQL and the graph model with Datalog, over the same data. Explore all your data. Across all implicit relationships, across all time. As a document-oriented database, XTDB makes your data immediately available without the need for an upfront schema. Both structured and unstructured data are at home in XTDB. Legal regulations like GDPR often pose a challenge when designing systems around immutable data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    LinearSolve.jl

    LinearSolve.jl

    High-Performance Unified Interface for Linear Solvers in Julia

    LinearSolve.jl is a unified interface for the linear solving packages of Julia. It interfaces with other packages of the Julia ecosystem to make it easy to test alternative solver packages and pass small types to control algorithm swapping. It also interfaces with the ModelingToolkit.jl world of symbolic modeling to allow for automatically generating high-performance code. Performance is key: the current methods are made to be highly performant on scalar and statically sized small problems,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Gridap.jl

    Gridap.jl

    Grid-based approximation of partial differential equations in Julia

    Gridap provides a set of tools for the grid-based approximation of partial differential equations (PDEs) written in the Julia programming language. The library currently supports linear and nonlinear PDE systems for scalar and vector fields, single and multi-field problems, conforming and nonconforming finite element (FE) discretizations, on structured and unstructured meshes of simplices and n-cubes. It also provides methods for time integration. Gridap is extensible and modular. One can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    marqo

    marqo

    Tensor search for humans

    ... and text-to-image search and analytics. Marqo adapts and stores your data in a fully schemaless manner. It combines tensor search with a query DSL that provides efficient pre-filtering. Tensor search allows you to go beyond keyword matching and search based on the meaning of text, images and other unstructured data. Be a part of the tribe and help us revolutionize the future of search. Whether you are a contributor, a user, or simply have questions about Marqo, we got your back.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing in C++17/20

    DocWire SDK, a standout C++17/20 data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. The upcoming integration of C++17 and C++20 will bring advanced functionalities, particularly in areas like HTTP capabilities and web data extraction. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    Pytorch Points 3D

    Pytorch Points 3D

    Pytorch framework for doing deep learning on point clouds

    Torch Points 3D is a framework for developing and testing common deep learning models to solve tasks related to unstructured 3D spatial data i.e. Point Clouds. The framework currently integrates some of the best-published architectures and it integrates the most common public datasets for ease of reproducibility. It heavily relies on Pytorch Geometric and Facebook Hydra library thanks for the great work! We aim to build a tool that can be used for benchmarking SOTA models, while also allowing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next