Showing 203 open source projects for "pentaho data integration"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Agentic Data Scientist

    Agentic Data Scientist

    An end-to-end Data Scientist

    Agentic Data Scientist is an experimental AI-driven research framework that orchestrates data science workflows through autonomous agents that can reason, plan, and execute complex analytics tasks. Unlike traditional scripted pipelines, this project lets AI agents break down high-level research goals into sub-tasks such as data acquisition, cleaning, modeling, evaluation, and reporting, with minimal human direction. Each agent is designed to independently call functions, interact with data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    ...It supports asynchronous task coordination, modular tool integration, and orchestrates the data flow between agents — making it suitable for large-scale or multi-stage research pipelines. Users can deploy it locally or on server infrastructure, integrate custom tools, and benefit from its flexible configuration.
    Downloads: 188 This Week
    Last Update:
    See Project
  • 3
    OpenDataMCP

    OpenDataMCP

    Connect any Open Data to any LLM with Model Context Protocol

    An initiative aimed at connecting open datasets to Large Language Models (LLMs) using the Model Context Protocol, facilitating seamless access and integration of public data into AI applications. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MCP Atlassian

    MCP Atlassian

    MCP server that integrates Confluence and Jira

    The MCP Atlassian server integrates Atlassian products like Confluence and Jira with the Model Context Protocol. It supports both Cloud and Server/Data Center deployments, enabling AI models to interact with these platforms securely. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    graphify

    graphify

    AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)

    graphify is a data visualization and transformation tool designed to convert structured or semi-structured data into graph-based representations, enabling better understanding of relationships and dependencies. It focuses on building visual models such as nodes and edges that represent entities and their connections, making complex datasets easier to interpret. The system likely supports dynamic updates, allowing graphs to evolve as data changes or new inputs are introduced. It is...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 7
    DeepBI

    DeepBI

    LLM based data scientist, AI native data application

    DeepBI is an AI-native data analysis platform. DeepBI leverages the power of large language models to explore, query, visualize, and share data from any data source. Users can use DeepBI to gain data insight and make data-driven decisions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ExtractThinker

    ExtractThinker

    ExtractThinker is a Document Intelligence library for LLMs

    ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Airtable MCP

    Airtable MCP

    Airtable integration for AI-powered applications

    Airtable MCP is an integration tool that enables AI-powered applications to access and manipulate Airtable databases directly from the IDE using Anthropic's Model Context Protocol (MCP). It allows querying, creating, updating, and deleting records using natural language, facilitating seamless data management. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    MySQL MCP Server

    MySQL MCP Server

    A Model Context Protocol (MCP) server that enables secure interaction

    The MySQL MCP Server enables secure interaction with MySQL databases, allowing AI assistants to list tables, read data, and execute SQL queries through a controlled interface. It is designed for integration with AI applications like Claude Desktop and should not be run as a standalone Python program. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MCP Timeplus

    MCP Timeplus

    Execute SQL queries and manage databases seamlessly with Timeplus

    An MCP server designed for integration with Timeplus, enabling real-time data streaming and analytics through natural language interactions. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13
    Clay Foundation Model

    Clay Foundation Model

    The Clay Foundation Model - An open source AI model and interface

    The Clay Foundation Model is an open-source AI model and interface designed to provide comprehensive data and insights about Earth. It aims to serve as a foundational tool for environmental monitoring, research, and decision-making by integrating various data sources and offering an accessible platform for analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Liger Kernel

    Liger Kernel

    Efficient Triton Kernels for LLM Training

    Liger Kernel is a unified kernel developed by LinkedIn to streamline data science and machine learning workflows across different languages and tools. It provides a consistent interface for running code in various languages (such as Python, R, SQL) within a single Jupyter-like environment, enhancing productivity and collaboration for data scientists working in mixed-language projects.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    X-AnyLabeling

    X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything

    X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 16
    BambooAI

    BambooAI

    A Python library powered by Language Models (LLMs)

    BambooAI is a Python library powered by large language models (LLMs) for conversational data discovery and analysis, allowing users to interact with data through natural language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ADX MCP Server

    ADX MCP Server

    A Model Context Protocol (MCP) server that enables AI assistants

    The Azure Data Explorer MCP Server is a Model Context Protocol (MCP) server that enables AI assistants to query and analyze Azure Data Explorer databases through standardized interfaces. It allows the execution of Kusto Query Language (KQL) queries and exploration of data within Azure Data Explorer clusters. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GeoAI

    GeoAI

    GeoAI: Artificial Intelligence for Geospatial Data

    GeoAI is a comprehensive open-source Python package designed to integrate artificial intelligence techniques with geospatial data analysis, enabling users to perform advanced geographic modeling and visualization tasks with ease. It provides a unified framework that combines machine learning libraries such as PyTorch and Transformers with geospatial tools, allowing users to process satellite imagery, aerial photos, and vector datasets in a streamlined workflow. The platform supports a wide...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    Biomni

    Biomni

    Biomni: a general-purpose biomedical AI agent

    ...It supports integration with multiple AI models, allowing flexibility in selecting the most appropriate model for specific tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    OpenPlanter

    OpenPlanter

    Language-model investigation agent with a terminal UI

    OpenPlanter is an open-source Python project focused on building an intelligent automated planting or gardening system powered by software control and data processing. The repository is designed to help developers and hobbyists create programmable plant management workflows that can monitor, schedule, and optimize growing conditions. It emphasizes automation and extensibility, allowing integration with sensors, environmental data, and control logic for smart cultivation setups. The system is structured to support experimentation and customization, making it suitable for both research and DIY agriculture projects. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Bespoke Curator

    Bespoke Curator

    Synthetic data curation for post-training and data extraction

    Curator is an open-source Python library designed to build synthetic data pipelines for training and evaluating machine learning models, particularly large language models. The system helps developers generate, transform, and curate high-quality datasets by combining automated generation with structured validation and filtering. It supports workflows where models are used to produce synthetic examples that can later be refined into reliable training datasets for reasoning, question...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    DATAGEN

    DATAGEN

    AI-driven multi-agent research assistant automating hypothesis

    DATAGEN is an AI-driven multi-agent research and data analysis platform designed to automate complex analytical workflows. The system coordinates multiple specialized AI agents that collaborate to perform tasks such as hypothesis generation, data collection, analysis, visualization, and report creation. Instead of requiring users to manually orchestrate each stage of a research process, the platform allows these agents to coordinate automatically and handle the workflow end-to-end. The...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    C3

    C3

    The goal of CLAIMED is to enable low-code/no-code rapid prototyping

    C3 is an open-source framework designed to simplify the development and deployment of data science and machine learning workflows through reusable components and low-code development techniques. The framework focuses on enabling rapid prototyping while maintaining a path to production through automated CI/CD integration. CLAIMED provides a component-based architecture where data processing steps, models, and workflows can be packaged into reusable operators.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Apache Hamilton

    Apache Hamilton

    Helps data scientists define testable self-documenting dataflows

    Apache Hamilton is an open-source Python framework designed to simplify the creation and management of dataflows used in analytics, machine learning pipelines, and data engineering workflows. The framework enables developers to define data transformations as simple Python functions, where each function represents a node in a dataflow graph and its parameters define dependencies on other nodes. Hamilton automatically analyzes these functions and constructs a directed acyclic graph...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB