Showing 212 open source projects for "data integration"

View related business solutions
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 1
    Superlinked

    Superlinked

    Superlinked is a Python framework for AI Engineers

    Superlinked is a Python framework designed for AI engineers to build high-performance search and recommendation applications that combine structured and unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Legion MCP

    Legion MCP

    A server that helps people access and query data in databases

    The Legion MCP Server is designed to help users access and query data in databases using the Legion Query Runner, integrated with the Model Context Protocol (MCP) Python SDK. It facilitates efficient data retrieval and analysis through standardized interfaces. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    SleepFM-Clinical

    SleepFM-Clinical

    Improve human sleep through scientifically

    SleepFM-Clinical is a specialized version of SleepFM designed for clinical and research environments, offering an adaptive audio modulation system aimed at improving human sleep through scientifically guided soundscapes. Rather than simply playing static white noise or ambient tracks, it uses a closed-loop, frequency-modulated framework that responds to user-specific sleep patterns and physiological signals to tailor sound in ways that can enhance sleep onset and depth. The clinical release...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    Chroma MCP

    Chroma MCP

    A Model Context Protocol (MCP) server implementation

    Chroma MCP Server is an implementation of the Model Context Protocol (MCP) designed to integrate large language model (LLM) applications with external data sources or tools. It offers a standardized framework to seamlessly provide LLMs with the context they require for effective operation. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...
    Downloads: 39 This Week
    Last Update:
    See Project
  • 7
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting,...
    Downloads: 94 This Week
    Last Update:
    See Project
  • 8
    Liger Kernel

    Liger Kernel

    Efficient Triton Kernels for LLM Training

    Liger Kernel is a unified kernel developed by LinkedIn to streamline data science and machine learning workflows across different languages and tools. It provides a consistent interface for running code in various languages (such as Python, R, SQL) within a single Jupyter-like environment, enhancing productivity and collaboration for data scientists working in mixed-language projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DATAGEN

    DATAGEN

    AI-driven multi-agent research assistant automating hypothesis

    DATAGEN is an AI-driven multi-agent research and data analysis platform designed to automate complex analytical workflows. The system coordinates multiple specialized AI agents that collaborate to perform tasks such as hypothesis generation, data collection, analysis, visualization, and report creation. Instead of requiring users to manually orchestrate each stage of a research process, the platform allows these agents to coordinate automatically and handle the workflow end-to-end. The...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    nesa

    nesa

    Run AI models end-to-end encrypted

    nesa is an open-source initiative focused on building decentralized AI infrastructure that enables secure, verifiable, and privacy-preserving machine learning and inference across distributed environments. The project aims to address key challenges in modern AI systems, such as data privacy, trust, and centralization, by leveraging cryptographic techniques and decentralized architectures. NESA allows developers to run AI computations in a way that ensures data integrity and confidentiality, making it particularly relevant for applications involving sensitive or regulated data. It integrates mechanisms for verifiable computation, enabling users to confirm that AI outputs were generated correctly without exposing underlying data or models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    NVIDIA Earth2Studio

    NVIDIA Earth2Studio

    Open-source deep-learning framework

    NVIDIA Earth2Studio is an open-source Python package and framework designed to accelerate the development and deployment of AI-driven weather and climate science workflows. It provides a unified API that lets researchers, data scientists, and engineers build complex forecasting and analysis pipelines by combining modular prognostic and diagnostic AI models with a diverse range of real-world data sources such as global forecast systems, reanalysis datasets, and satellite feeds. The toolkit...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Apache Hamilton

    Apache Hamilton

    Helps data scientists define testable self-documenting dataflows

    Apache Hamilton is an open-source Python framework designed to simplify the creation and management of dataflows used in analytics, machine learning pipelines, and data engineering workflows. The framework enables developers to define data transformations as simple Python functions, where each function represents a node in a dataflow graph and its parameters define dependencies on other nodes. Hamilton automatically analyzes these functions and constructs a directed acyclic graph...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Bespoke Curator

    Bespoke Curator

    Synthetic data curation for post-training and data extraction

    Curator is an open-source Python library designed to build synthetic data pipelines for training and evaluating machine learning models, particularly large language models. The system helps developers generate, transform, and curate high-quality datasets by combining automated generation with structured validation and filtering. It supports workflows where models are used to produce synthetic examples that can later be refined into reliable training datasets for reasoning, question...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    FL4Health

    FL4Health

    Library to facilitate federated learning research

    FL4Health is a Vector Institute toolkit for building modular, clinically-focused FL pipelines. Tailored for healthcare, it supports privacy-preserving FL, heterogeneous data settings, integrated reporting, and clear API design.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    LLM Vision

    LLM Vision

    Visual intelligence for your home.

    ...The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. It also maintains a timeline of analyzed camera events that can be displayed in dashboards or queried through the assistant interface.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    FinGPT

    FinGPT

    Open-Source Financial Large Language Models

    FinGPT is an open-source, finance-specialized large language model framework that blends the capabilities of general LLMs with real-time financial data feeds, domain-specific knowledge bases, and task-oriented agents to support market analysis, research automation, and decision support. It extends traditional GPT-style models by connecting them to live or historical financial datasets, news APIs, and economic indicators so that outputs are grounded in relevant and recent market conditions...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    Integuru v0

    Integuru v0

    The first AI agent that builds permissionless integrations

    ...The project is designed as a research platform for exploring AI-driven automation and integration generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Datapizza AI

    Datapizza AI

    Build reliable Gen AI solutions without overhead

    ...It provides a flexible architecture where individual agents can be assigned specialized roles, such as web search, reasoning, or domain-specific expertise, and can communicate with each other to complete tasks collaboratively. The framework supports integration with external APIs and tools, allowing agents to perform actions like retrieving data, executing functions, or interacting with external services. It is particularly well-suited for building retrieval-augmented generation pipelines, automation systems, and experimental AI applications that require coordination between multiple components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    LOTUS

    LOTUS

    AI-Powered Data Processing: Use LOTUS to process all of your datasets

    LOTUS is an open-source framework and query engine designed to enable efficient processing of structured and unstructured datasets using large language models. The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ChatTTS

    ChatTTS

    A generative speech model for daily dialogue

    ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    OpenViking

    OpenViking

    Context database designed specifically for AI Agents

    OpenViking is an open-source context database engineered for efficient indexing and retrieval of large amounts of unstructured or semi-structured context data used by AI applications. It’s primarily designed to serve as a high-performance, scalable backend for storing app context, embeddings, conversational histories, and other textual artifacts that need rapid lookup and semantic search, which makes it especially useful for systems like chatbots or memory-augmented agents. The project is...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    TrustGraph

    TrustGraph

    Deploy reasoning AI agents powered by agentic graph RAG in minutes

    TrustGraph is an AI-driven framework designed to assess and visualize trust relationships within networks, aiding in the analysis of trustworthiness and influence among entities.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Matrix

    Matrix

    Multi-Agent daTa geneRation Infra and eXperimentation framework

    ...That design makes Matrix particularly well-suited for large-batch inference, model benchmarking, data curation, augmentation, or generation — whether for language, code, dialogue, or multimodal tasks. It supports both open-source LLMs and proprietary models (via integration with model backends), and works with containerized or sandboxed environments for safe tool execution or external code runs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    E2M

    E2M

    E2M converts various file types (doc, docx, epub, html, htm, url

    E2M is a SourceForge mirror of the e2m open-source project, which focuses on providing tools or services designed to convert or process content between different formats or systems. Projects with similar naming conventions typically emphasize automation workflows where input data from one environment is transformed into another representation or output structure. The mirrored repository allows users to access the project’s codebase independently from its original hosting platform while...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB