Showing 341 open source projects for "data processing"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    SuperDuperDB

    SuperDuperDB

    Integrate, train and manage any AI models and APIs with your database

    ...Integrate and combine models from Sklearn, PyTorch, HuggingFace with AI APIs such as OpenAI to build even the most complex AI applications and workflows. Train models on your data in your datastore simply by querying without additional ingestion and pre-processing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    NLP-Knowledge-Graph

    NLP-Knowledge-Graph

    Research and application of technologies such as nl processing

    NLP-Knowledge-Graph is an open educational repository that collects resources, research materials, and tutorials focused on the intersection of natural language processing and knowledge graph technologies. The project aims to help researchers and developers understand how structured knowledge representations can enhance language processing systems. It includes curated materials covering key topics such as knowledge graph construction, entity recognition, relation extraction, graph embeddings, and semantic reasoning. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OpenPlanter

    OpenPlanter

    Language-model investigation agent with a terminal UI

    OpenPlanter is an open-source Python project focused on building an intelligent automated planting or gardening system powered by software control and data processing. The repository is designed to help developers and hobbyists create programmable plant management workflows that can monitor, schedule, and optimize growing conditions. It emphasizes automation and extensibility, allowing integration with sensors, environmental data, and control logic for smart cultivation setups. The system is structured to support experimentation and customization, making it suitable for both research and DIY agriculture projects. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Nimbalyst

    Nimbalyst

    Run multiple Codex and Claude Code AI sessions

    ...Crystal often leverages modern programming practices and clean architecture principles to ensure maintainability and scalability as projects grow. It can be used as a foundation for building internal tools, automation systems, or data processing pipelines, depending on how developers configure its components. The system is particularly useful for teams that want control over their infrastructure without relying on overly complex or opinionated platforms.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Docling

    Docling

    Get your documents ready for gen AI

    Docling is an open-source document processing toolkit built to prepare diverse content types for modern generative AI and data workflows. The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    OmniRoute

    OmniRoute

    OmniRoute is an AI gateway for multi-provider LLM

    ...It is particularly useful in distributed systems where requests need to be intelligently routed between APIs, microservices, or processing pipelines. OmniRoute aims to reduce boilerplate by centralizing routing logic and providing reusable patterns for managing complex flows. Its architecture supports scalability and maintainability, making it suitable for both small applications and larger systems with multiple integrations.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 7
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    Handy is a free, open-source, offline speech-to-text application built for privacy, accessibility, and extensibility. Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active...
    Downloads: 67 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Vectorize MCP Server

    Vectorize MCP Server

    Official Vectorize MCP Server

    The Vectorize MCP Server is a Model Context Protocol server that integrates with Vectorize, offering advanced vector retrieval and text extraction capabilities. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    OpenClaw Medical Skills

    OpenClaw Medical Skills

    The largest open-source medical AI skills library for OpenClaw

    ...This modular design allows developers and researchers to build AI systems that can access specialized medical reasoning processes, retrieve relevant biomedical information, and generate structured outputs suitable for analysis or downstream processing.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    clip-retrieval

    clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system

    clip-retrieval is an open-source toolkit designed to build large-scale semantic search systems for images and text by leveraging CLIP embeddings to enable multimodal retrieval. It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. It includes...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    SALMONN family

    SALMONN family

    A suite of advanced multi-modal LLMs

    SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Weaviate

    Weaviate

    Weaviate is a cloud-native, modular, real-time vector search engine

    Weaviate in a nutshell: Weaviate is a vector search engine and vector database. Weaviate uses machine learning to vectorize and store data, and to find answers to natural language queries. With Weaviate you can also bring your custom ML models to production scale. Weaviate in detail: Weaviate is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer-Extraction, Classification, Customizable...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    HeavyDB

    HeavyDB

    HeavyDB (formerly MapD/OmniSciDB)

    ...Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the OmniSci platform (formerly MapD) and is commonly used for large-scale analytics and geospatial data processing. The database compiles queries into optimized machine code that executes efficiently on GPU hardware, significantly accelerating analytical workloads. It supports hybrid deployment environments where queries can run on both CPU and GPU architectures depending on the available resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Kaggle Solutions

    Kaggle Solutions

    Collection of Kaggle Solutions and Ideas

    ...The repository also highlights important machine learning concepts such as feature engineering, cross-validation strategies, ensemble modeling, and post-processing methods commonly used in winning solutions. Because the content is organized by competition categories such as computer vision, natural language processing, tabular data, and time-series forecasting, users can explore techniques relevant to specific problem types.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Preswald

    Preswald

    Python tool for browser-based interactive data apps in one file

    Preswald is an open source Python-based framework and static-site generator designed for building interactive data applications that run entirely in the browser. It packages application logic, data processing, and user interface components into a single self-contained output, enabling easy sharing and deployment without requiring local dependencies. Preswald leverages a WebAssembly runtime along with technologies like Pyodide and DuckDB to execute Python code directly in the browser environment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AI Powered Knowledge Graph Generator

    AI Powered Knowledge Graph Generator

    AI Powered Knowledge Graph Generator

    AI-Powered Knowledge Graph is an open-source project focused on building knowledge graph systems that integrate artificial intelligence and machine learning to represent complex relationships between data entities. Knowledge graphs organize information as networks of nodes and relationships, allowing applications to analyze connections between concepts, datasets, or real-world entities. By incorporating AI techniques such as natural language processing and semantic reasoning, the project enables systems to automatically extract relationships and insights from large volumes of data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    AI App Lab

    AI App Lab

    Implementing large models into scenario-based applications

    AI App Lab is an open-source platform developed by Volcengine that provides tools, SDKs, and example applications for building real-world AI applications powered by large language models. The project focuses on helping developers bridge the gap between AI models and practical business use cases by offering a structured environment for creating production-ready AI systems. It includes a high-level SDK called Arkitect, which provides workflows and tools for integrating models, plugins, and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    Amazing-Python-Scripts

    Amazing-Python-Scripts

    Curated collection of Amazing Python scripts

    Amazing-Python-Scripts is a collaborative repository that collects a wide variety of Python scripts designed to demonstrate practical programming techniques and automation tasks. The project includes scripts ranging from beginner-level utilities to more advanced applications involving machine learning, data processing, and system automation. Its goal is to provide developers with useful coding examples that can solve everyday problems, automate repetitive tasks, or serve as learning exercises. The repository encourages community contributions, allowing developers to add their own scripts and improve existing ones through pull requests. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Netflix Maestro

    Netflix Maestro

    Netflix’s Workflow Orchestrator

    Maestro is a large-scale workflow orchestration platform originally developed by Netflix to coordinate complex data processing and machine learning workflows across distributed systems. The system acts as a general-purpose workflow orchestrator that manages the execution, scheduling, monitoring, and recovery of large pipelines used for analytics and AI operations. It was designed to support the demanding internal infrastructure of Netflix, where thousands of workflows must process massive volumes of data reliably and efficiently every day. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Xiyan MCP Server

    Xiyan MCP Server

    A Model Context Protocol (MCP) server

    The XiYan MCP Server is a Model Context Protocol (MCP) server that enables natural language queries to databases, powered by XiYan-SQL, a state-of-the-art text-to-SQL model. It allows users to interact with databases using conversational language, simplifying data retrieval processes. ​
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Memobase

    Memobase

    Fast backend for long-term AI user memory via structured profiles

    Memobase is an open source backend system that enables long-term user memory functionality for AI applications by capturing and structuring information about users across interactions. Its design centers on creating user profiles and recording event timelines, allowing AI systems to remember, understand, and evolve in their behaviour toward individual users over time. Instead of relying purely on traditional embedding-based retrieval or RAG systems, Memobase uses profile and timeline...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    autoMate

    autoMate

    AI tool for automating desktop tasks via natural language input

    ...Unlike conventional RPA tools that require predefined workflows, autoMate dynamically adapts to tasks by making autonomous decisions based on the current interface state. autoMate emphasizes local execution, meaning all processing happens on the user’s machine to maintain privacy and data security.
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB