Showing 396 open source projects for "data integration"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    webclaw

    webclaw

    Fast, local-first web content extraction for LLMs

    ...It supports multiple modes of operation, including CLI usage, REST API access, and an MCP server for direct integration with agent-based systems. Webclaw also provides advanced capabilities such as recursive crawling, structured JSON extraction, summarization, and content comparison, making it suitable for research and data pipelines. Its local-first architecture ensures privacy and eliminates the need for API keys.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Grok CLI

    Grok CLI

    An open-source AI agent that brings the power of Grok

    Grok CLI is a command-line interface built around the Grok AI model that brings programmatic and conversational AI capabilities directly to developer terminals. It lets you run Grok queries from your shell, scripting environment, or automation workflows without switching to a browser, enabling utility in scripting, quick data exploration, code generation, and assistant-guided tasks directly where you write code. The CLI supports streaming responses, so outputs appear in real time as the Grok...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    Clay Foundation Model

    Clay Foundation Model

    The Clay Foundation Model - An open source AI model and interface

    The Clay Foundation Model is an open-source AI model and interface designed to provide comprehensive data and insights about Earth. It aims to serve as a foundational tool for environmental monitoring, research, and decision-making by integrating various data sources and offering an accessible platform for analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MCP Go

    MCP Go

    A Go implementation of the Model Context Protocol (MCP)

    mcp-go is a Go implementation of the Model Context Protocol (MCP), designed to enable seamless integration between Large Language Model (LLM) applications and external data sources and tools. It abstracts the complexities of the protocol and server management, allowing developers to focus on building robust tools. The library is high-level and user-friendly, facilitating the development of MCP servers in Go. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    MCP BigQuery Server

    MCP BigQuery Server

    A Model Context Protocol (MCP) server that provides secure

    This MCP server provides secure, read-only access to BigQuery datasets, enabling large language models (LLMs) to safely query and analyze data through a standardized interface. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BambooAI

    BambooAI

    A Python library powered by Language Models (LLMs)

    BambooAI is a Python library powered by large language models (LLMs) for conversational data discovery and analysis, allowing users to interact with data through natural language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Paperclip

    Paperclip

    Open-source orchestration for zero-human companies

    ...Instead of requiring separate APIs and authentication flows for each service, Paperclip provides unified search and retrieval capabilities that simplify integration into AI workflows.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 8
    GeoAI

    GeoAI

    GeoAI: Artificial Intelligence for Geospatial Data

    GeoAI is a comprehensive open-source Python package designed to integrate artificial intelligence techniques with geospatial data analysis, enabling users to perform advanced geographic modeling and visualization tasks with ease. It provides a unified framework that combines machine learning libraries such as PyTorch and Transformers with geospatial tools, allowing users to process satellite imagery, aerial photos, and vector datasets in a streamlined workflow. The platform supports a wide...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    Casibase

    Casibase

    Open-source enterprise-level AI knowledge base and MCP

    ...It also supports integration with existing systems through database synchronization, allowing organizations to migrate data into the platform without major infrastructure changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Biomni

    Biomni

    Biomni: a general-purpose biomedical AI agent

    ...It supports integration with multiple AI models, allowing flexibility in selecting the most appropriate model for specific tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    OpenPlanter

    OpenPlanter

    Language-model investigation agent with a terminal UI

    OpenPlanter is an open-source Python project focused on building an intelligent automated planting or gardening system powered by software control and data processing. The repository is designed to help developers and hobbyists create programmable plant management workflows that can monitor, schedule, and optimize growing conditions. It emphasizes automation and extensibility, allowing integration with sensors, environmental data, and control logic for smart cultivation setups. The system is structured to support experimentation and customization, making it suitable for both research and DIY agriculture projects. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually. This integration empowers non-programmers and rapid-iteration teams to harness the performance of LTX-Video while maintaining the clarity and flexibility of a dataflow graph model. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    Airweave

    Airweave

    Airweave lets agents search any app

    Airweave is an open-source platform that enables agents to semantically search across various applications, databases, and APIs. By transforming disparate data sources into a unified, searchable knowledge base, Airweave facilitates intelligent information retrieval through REST APIs or the MCP protocol. It's particularly useful for building AI agents that require access to structured and unstructured data across multiple platforms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    C3

    C3

    The goal of CLAIMED is to enable low-code/no-code rapid prototyping

    C3 is an open-source framework designed to simplify the development and deployment of data science and machine learning workflows through reusable components and low-code development techniques. The framework focuses on enabling rapid prototyping while maintaining a path to production through automated CI/CD integration. CLAIMED provides a component-based architecture where data processing steps, models, and workflows can be packaged into reusable operators.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    OpenClaude

    OpenClaude

    Claude Code opened to any LLM

    OpenClaude is an open-source alternative or extension inspired by Claude-style agent systems, designed to provide similar capabilities in a customizable and self-hosted environment. The project focuses on enabling users to run their own AI agents with full control over data, workflows, and integrations, reducing reliance on proprietary platforms. It likely includes support for executing tasks, managing context, and interacting with external tools, allowing agents to perform real-world...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 19
    LLM.swift

    LLM.swift

    LLM.swift is a simple and readable library

    LLM.swift is a Swift package that enables developers to run Large Language Models (LLMs) directly on Apple devices, including iOS, macOS, and watchOS. By leveraging Apple's hardware and software optimizations, LLM.swift facilitates on-device natural language processing tasks, ensuring user privacy and reducing latency associated with cloud-based solutions.​
    Downloads: 18 This Week
    Last Update:
    See Project
  • 20
    Superlinked

    Superlinked

    Superlinked is a Python framework for AI Engineers

    Superlinked is a Python framework designed for AI engineers to build high-performance search and recommendation applications that combine structured and unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Agents Towards Production

    Agents Towards Production

    Code-first tutorials covering every layer of GenAI agents

    Agents Towards Production is an opinionated, code-first playbook for taking AI agents from prototype to production-ready systems. Instead of focusing only on toy examples, it dives into every layer of an agent stack: orchestration, memory, RAG, tool and API integration, security, observability, deployment, evaluation, and UI. The repository is built around runnable tutorials, each in its own folder, often sponsored by or built in collaboration with infrastructure providers like LangChain, Redis, Bright Data, Contextual AI, Tavily, Runpod, Portia, and others. These tutorials show how to implement things like secure tool calling with OAuth, dual-memory architectures, production RAG agents, multi-agent communication protocols, GPU deployment, containerization with Docker, FastAPI endpoints, and Streamlit chat UIs. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Rill

    Rill

    Fast SQL-based BI tool for real-time dashboards and analytics

    Rill is an operational BI tool that turns raw datasets into fast, interactive dashboards using SQL and a code-first approach. It helps data teams move from data lake to insight quickly, without the complexity of traditional BI systems. With an embedded in-memory database powered by DuckDB or ClickHouse, queries run in milliseconds, enabling real-time exploration and analysis. Rill supports local and remote data sources such as CSV, Parquet, S3, and GCS, making it flexible across...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Legion MCP

    Legion MCP

    A server that helps people access and query data in databases

    The Legion MCP Server is designed to help users access and query data in databases using the Legion Query Runner, integrated with the Model Context Protocol (MCP) Python SDK. It facilitates efficient data retrieval and analysis through standardized interfaces. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Wren Engine

    Wren Engine

    The Semantic Engine for Model Context Protocol(MCP)

    Wren Engine is a semantic engine designed to empower Model Context Protocol (MCP) clients and AI agents by providing accurate, contextual, and governed access to business data. It serves as a bridge between large language models (LLMs) and enterprise systems, facilitating seamless integration and interaction. ​
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB