Search Results for "pentaho data integration" - Page 4

Showing 432 open source projects for "pentaho data integration"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Apache Airflow Provider

    Apache Airflow Provider

    Great Expectations Airflow operator

    Due to apply_default decorator removal, this version of the provider requires Airflow 2.1.0+. If your Airflow version is 2.1.0, and you want to install this provider version, first upgrade Airflow to at least version 2.1.0. Otherwise, your Airflow package version will be upgraded automatically, and you will have to manually run airflow upgrade db to complete the migration. This operator currently works with the Great Expectations V3 Batch Request API only. If you would like to use the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 3
    NVIDIA Earth2Studio

    NVIDIA Earth2Studio

    Open-source deep-learning framework

    NVIDIA Earth2Studio is an open-source Python package and framework designed to accelerate the development and deployment of AI-driven weather and climate science workflows. It provides a unified API that lets researchers, data scientists, and engineers build complex forecasting and analysis pipelines by combining modular prognostic and diagnostic AI models with a diverse range of real-world data sources such as global forecast systems, reanalysis datasets, and satellite feeds. The toolkit...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    SENAITE LIMS

    SENAITE LIMS

    SENAITE Meta Package

    SENAITE is a beautiful trigonal, oil-green to greenish-black crystal, with almost the hardness of a diamond. Although the crystal is described with a complex formula, it still has clear and straight shapes. Therefore, it reflects nicely the complexity of the LIMS, while providing a modern, intuitive, and friendly UI/ UX. Amongst other functionalities, SENAITE comes with highly-customizable workflows to drive users through the analytical process, easy-to-use UI for data registration,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    FL4Health

    FL4Health

    Library to facilitate federated learning research

    FL4Health is a Vector Institute toolkit for building modular, clinically-focused FL pipelines. Tailored for healthcare, it supports privacy-preserving FL, heterogeneous data settings, integrated reporting, and clear API design.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    E2M

    E2M

    E2M converts various file types (doc, docx, epub, html, htm, url

    E2M is a SourceForge mirror of the e2m open-source project, which focuses on providing tools or services designed to convert or process content between different formats or systems. Projects with similar naming conventions typically emphasize automation workflows where input data from one environment is transformed into another representation or output structure. The mirrored repository allows users to access the project’s codebase independently from its original hosting platform while...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Modin

    Modin

    Scale your Pandas workflows by changing a single line of code

    ...Modin uses Ray, Dask or Unidist to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. It is not necessary to know in advance the available hardware resources in order to use Modin. Additionally, it is not necessary to specify how to distribute or place data. Modin acts as a drop-in replacement for pandas, which means that you can continue using your previous pandas notebooks, unchanged, while experiencing a considerable speedup thanks to Modin, even on a single machine. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Bespoke Curator

    Bespoke Curator

    Synthetic data curation for post-training and data extraction

    Curator is an open-source Python library designed to build synthetic data pipelines for training and evaluating machine learning models, particularly large language models. The system helps developers generate, transform, and curate high-quality datasets by combining automated generation with structured validation and filtering. It supports workflows where models are used to produce synthetic examples that can later be refined into reliable training datasets for reasoning, question...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    Datapizza AI

    Datapizza AI

    Build reliable Gen AI solutions without overhead

    ...It provides a flexible architecture where individual agents can be assigned specialized roles, such as web search, reasoning, or domain-specific expertise, and can communicate with each other to complete tasks collaboratively. The framework supports integration with external APIs and tools, allowing agents to perform actions like retrieving data, executing functions, or interacting with external services. It is particularly well-suited for building retrieval-augmented generation pipelines, automation systems, and experimental AI applications that require coordination between multiple components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Airweave

    Airweave

    Airweave lets agents search any app

    Airweave is an open-source platform that enables agents to semantically search across various applications, databases, and APIs. By transforming disparate data sources into a unified, searchable knowledge base, Airweave facilitates intelligent information retrieval through REST APIs or the MCP protocol. It's particularly useful for building AI agents that require access to structured and unstructured data across multiple platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Integuru v0

    Integuru v0

    The first AI agent that builds permissionless integrations

    ...The project is designed as a research platform for exploring AI-driven automation and integration generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SleepFM-Clinical

    SleepFM-Clinical

    Improve human sleep through scientifically

    SleepFM-Clinical is a specialized version of SleepFM designed for clinical and research environments, offering an adaptive audio modulation system aimed at improving human sleep through scientifically guided soundscapes. Rather than simply playing static white noise or ambient tracks, it uses a closed-loop, frequency-modulated framework that responds to user-specific sleep patterns and physiological signals to tailor sound in ways that can enhance sleep onset and depth. The clinical release...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Open Wearables

    Open Wearables

    Self-hosted platform to unify wearable health data

    Open Wearables is an open-source initiative that aims to provide a community-driven ecosystem for wearable device software and interoperability by connecting sensor data, activity tracking, and health insights across multiple platforms and devices. Instead of relying on closed vendor ecosystems, the project provides standardized data models and APIs that let developers and hobbyists collect, sync, and analyze biometric and environmental data from wearables, DIY sensors, and open hardware...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Claude Code Plugins

    Claude Code Plugins

    Intelligent automation and multi-agent orchestration for Claude Code

    ...It emphasizes simplicity and composability, allowing developers to define agent behaviors through reusable components rather than monolithic logic. The framework supports integration with various tools and APIs, enabling agents to perform actions such as data retrieval, automation, and decision-making processes. It is particularly useful for experimenting with autonomous or semi-autonomous systems that rely on prompt-driven logic and tool usage. The design encourages transparency and control over how agents operate, making it suitable for both prototyping and production scenarios.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    LOTUS

    LOTUS

    AI-Powered Data Processing: Use LOTUS to process all of your datasets

    LOTUS is an open-source framework and query engine designed to enable efficient processing of structured and unstructured datasets using large language models. The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    WSA-Script

    WSA-Script

    Integrate Magisk root and Google Apps into WSA

    WSA-Script is a community-driven automation and scripting toolkit that helps Windows users extend the Windows Subsystem for Android (WSA) by integrating optional features like Magisk (for root access) and Google Apps into the otherwise vanilla WSA environment, leveraging GitHub Actions and scripted installers to do much of the heavy lifting. The project provides a way to download and unpack custom builds of the WSA package that bundle these enhancements and guide users through installation...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 18
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Shynet

    Shynet

    Modern, privacy-friendly, and detailed web analytics

    Modern, privacy-friendly, and detailed web analytics that works without cookies or JS. There are a lot of web analytics tools. Unfortunately, most of them come with the following caveats. They require handing all of your visitors' info to a third-party company They use cookies to track visitors across sessions, so you need to have those annoying cookie notices. They collect so much personal data that even the NSA is jealous. They are closed source and/or expensive, often with limited data...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    PyVista

    PyVista

    3D plotting and mesh analysis through a streamlined interface

    3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK). PyVista is a helper module for the Visualization Toolkit (VTK) that takes a different approach on interfacing with VTK through NumPy and direct array access. This package provides a Pythonic, well-documented interface exposing VTK’s powerful visualization backend to facilitate rapid prototyping, analysis, and visual integration of spatially referenced datasets. This module can be used for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    OpenBB Terminal

    OpenBB Terminal

    Investment research for everyone, anywhere

    Fully written in python which is one of the most used programming languages due to its simplified syntax and shallow learning curve. It is the first time in history that users, regardless of their background, can so easily add features to an investment research platform. The MIT Open Source license allows any user to fork the project to either add features to the broader community or create their own customized terminal version. The terminal allows for users to import their own proprietary...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Robin

    Robin

    AI-powered tool for dark web OSINT search and investigation

    Robin is an AI-powered open source tool designed to assist investigators and researchers in conducting dark web OSINT (Open Source Intelligence) investigations. It combines automated dark web search capabilities with large language models (LLMs) to analyze and summarize information discovered across hidden services and Tor-based search engines. The tool helps refine investigative queries, collect results from multiple dark web sources, and filter relevant intelligence using AI-driven...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 23
    Harpoon

    Harpoon

    Command line OSINT and threat intelligence automation tool

    Harpoon is a command line tool designed to assist with open source intelligence (OSINT) and threat intelligence investigations. It helps security professionals and researchers collect and analyze publicly available information from a wide range of online sources. Harpoon is written in Python and organized around a modular plugin system, where each plugin is responsible for querying a specific platform, API, or intelligence service. This design allows users to automate many reconnaissance and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    CRAB

    CRAB

    CRAB: Cross-environment Agent Benchmark for Multimodal Language Model

    CRAB (Composable and Reusable Autonomous Bots) is a framework for building modular, reusable AI agents that can perform complex tasks in various domains. It focuses on creating AI-driven workflows that can be composed of multiple autonomous agents working together.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Blueprint MCP

    Blueprint MCP

    Diagram generation for understanding codebases and system architecture

    Blueprint MCP is a modular control plane designed for managing and orchestrating multiple game-server clusters in real time, giving operators fine-grained control over scaling, configuration, and deployment workflows across distributed infrastructure. It provides a central management REST API and dashboard where teams can view cluster health, adjust instance fleets, set auto-scaling policies, and monitor usage metrics in a unified interface. Blueprint-MCP also supports templated server...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB