Showing 60 open source projects for "ingest"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Calibre-Web Automated

    Calibre-Web Automated

    Calibre-Web but Automated and with tons of New Features

    ...CWA keeps the familiar strengths of Calibre-Web, like a responsive interface, OPDS feeds for e-readers, and strong user/permission management, while layering in extensive automation features that handle ingest, conversion, metadata enforcement, and library organization with less manual effort. It also expands authentication options (including modern OAuth/OIDC flows) and adds quality-of-life improvements such as enhanced search/filtering, multi-language UI support, and better device-centric workflows for sending to e-readers.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    TypeAgent Python

    TypeAgent Python

    Structured RAG: ingest, index, query

    TypeAgent Python is an experimental Python implementation of Microsoft’s TypeAgent architecture designed to explore how large language models can interact with structured software systems. The project focuses on implementing structured Retrieval-Augmented Generation workflows that allow agents to ingest information, index it in structured form, and answer queries using language models. Instead of relying solely on free-form prompts, the architecture emphasizes converting natural language interactions into structured representations that can be processed by deterministic software components. This design allows the system to combine the flexibility of language models with the reliability of traditional programming logic. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Prometheus SNMP Exporter

    Prometheus SNMP Exporter

    SNMP Exporter for Prometheus

    This exporter is the recommended way to expose SNMP data in a format that Prometheus can ingest. To simply get started, it's recommended to use the if_mib module with switches, access points, or routers using the public_v2 auth module, which should be a read-only access community on the target device. Note, that community strings in SNMP are not considered secrets, as they are sent unencrypted in SNMP v1 and v2c. For secure access, SNMP v3 is required.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4
    TimescaleDB

    TimescaleDB

    An open-source time-series SQL database optimized for fast ingest

    TimescaleDB is the open-source relational database for time-series and analytics. Build powerful data-intensive applications. Become instantly productive with full SQL. Rely on the same PostgreSQL you know, love, and trust. Hyperfunctions make time series easier. Achieve 10-100x faster queries than with vanilla PostgreSQL, InfluxDB, MongoDB. Write millions of data points per second per node. Horizontally scale to petabytes. Don’t worry about cardinality. Simplify your stack, ask more complex...
    Downloads: 63 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    IPFS Cluster

    IPFS Cluster

    Pinset orchestration for IPFS

    IPFS Cluster provides data orchestration across a swarm of IPFS daemons by allocating, replicating and tracking a global pinset distributed among multiple peers. IPFS has given the users the power of content-addressed storage. The permanent web requires, however, a data redundancy and availability solution that does not compromise on the distributed nature of the IPFS Network. IPFS Cluster is a distributed application that works as a sidecar to IPFS peers, maintaining a global cluster pinset...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Eliza

    Eliza

    Autonomous agents for everyone

    Build and deploy autonomous AI agents with consistent personalities across Discord, Twitter, and Telegram. Full support for voice, text, and media interactions. Built-in RAG memory system, document processing, media analysis, and autonomous trading capabilities. Supports multiple AI models including Llama, GPT-4, and Claude. Create custom actions, add new platform integrations, and extend functionality through a modular plugin system. Full TypeScript support.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Arize Phoenix

    Arize Phoenix

    Uncover insights, surface problems, monitor, and fine tune your LLM

    Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. A large set of these technologies are being deployed into businesses (the real world) in what we consider a production setting.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Supermemory

    Supermemory

    Memory engine and app that is extremely fast, scalable

    Supermemory is an ambitious and extensible AI-powered personal knowledge management system that aims to help users capture, organize, retrieve, and reason over information in a manner that mimics human memory structures. The platform allows individuals to ingest text, documents, and other content forms, then uses advanced retrieval and embedding techniques to index and relate information intelligently so that users can recall relevant knowledge in context rather than just by keyword match. It often incorporates clustering, semantic search, and summarization modules to reduce cognitive load and surface key ideas, which makes it useful for research, study, writing, and long-term project tracking. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Easy DataSet

    Easy DataSet

    A powerful tool for creating datasets for LLM fine-tuning

    Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    HunyuanWorld-Mirror

    HunyuanWorld-Mirror

    Fast and Universal 3D reconstruction model for versatile tasks

    HunyuanWorld-Mirror focuses on fast, universal 3D reconstruction that can ingest varied inputs and produce multiple kinds of 3D outputs. The model accepts combinations of images, camera intrinsics and poses, or even depth cues, then reconstructs consistent 3D geometry suitable for downstream rendering or editing. The pipeline emphasizes both speed and flexibility so creators can go from casual captures to assets without elaborate capture rigs.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    AI PDF Chatbot LangChain

    AI PDF Chatbot LangChain

    AI PDF chatbot agent built with LangChain & LangGraph

    AI PDF Chatbot LangChain is a full-stack template for building conversational agents that can ingest and answer questions about PDF documents. The project demonstrates how to combine LangChain and LangGraph with a vector database to enable retrieval-augmented question answering over user-provided files. It includes both frontend and backend components, making it suitable as a production starting point rather than just a minimal demo.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    OpenArchiver

    OpenArchiver

    An open-source platform for legally compliant email archiving

    OpenArchiver is a comprehensive, self-hosted email archiving and compliance platform built to help organizations ingest, index, store, and search email communication data across diverse sources like Gmail, Microsoft 365, IMAP, PST, and more. It’s designed for scenarios where reliable, tamper-proof archiving and full-text search across both emails and attachments are essential for legal discovery, compliance, or long-term records retention. The platform combines a modern web UI with powerful backend services, including fast indexing, deduplication, encryption at rest, and asynchronous ingestion workflows, making it suitable for both small teams and enterprise deployments. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    mcp-server-chatsum

    mcp-server-chatsum

    Query and Summarize your chat messages

    mcp-server-chatsum is an MCP server that indexes your chat history and provides tools to query and produce focused summaries on demand. It offers a simple flow: point the server at a local chat database, run the companion chatbot to ingest messages, and then use the MCP tool to retrieve scoped threads and generate concise syntheses. The tool design lets agents filter by participants, time ranges, or keywords before summarizing, which keeps outputs relevant and reduces hallucinated context. Documentation provides a “before you start” checklist to initialize the dataset and highlights the single tool (query_chat_messages) that returns results and summaries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Quickwit

    Quickwit

    Sub-second search & analytics engine on cloud storage

    Sub-second search & analytics engine on cloud storage. Quickwit is the fastest search engine on cloud storage. Quickwit has an Elasticsearch-compatible Ingest-API to make it easier to migrate your log shippers (Vector, Fluent Bit, Syslog, ...) to Quickwit. However, we only support ES aggregation DSL, the query DSL support is planned for Q2 2023. The core difference and advantage of Quickwit are its architecture built from the ground to search on cloud storage. We optimized IO paths, revamped the index data structures and made search stateless and sub-second on cloud storage. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    OpenObserve

    OpenObserve

    Elasticsearch/Splunk/Datadog alternative for (logs, metrics, traces)

    OpenObserve is a cloud-native observability platform built specifically for logs, metrics, traces, and analytics designed to work at a petabyte scale. It is very simple and easy to operate as opposed to Elasticsearch which requires a couple of dozen knobs to understand and tune which you can get up and running in under 2 minutes. It is a drop-in replacement for Elasticsearch if you are just ingesting data using APIs and searching using Kibana (Kibana is not supported nor required with...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    DenchClaw

    DenchClaw

    Fully Managed OpenClaw Framework for all knowledge work ever

    ...The system combines database management, browser automation, and AI reasoning into a unified interface where users can interact with their data and tools using natural language commands. It can ingest data from sources such as Google Drive, Notion, Gmail, and CRM platforms, consolidating everything into a centralized workspace for analysis and action. One of its most distinctive capabilities is its ability to use the user’s existing browser session, enabling it to log into services, scrape data, and perform actions like outreach or research as if it were the user.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Kernel Memory

    Kernel Memory

    Research project. A Memory solution for users, teams, and applications

    ...It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. Kernel Memory can ingest documents in multiple formats, process them into embeddings, and store them in searchable indexes. Applications can then query these indexed data sources to retrieve relevant information and include it as context for AI responses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    RAG Anything

    RAG Anything

    RAG-Anything: All-in-One RAG Framework

    RAG-Anything is an open-source unified framework that extends the Retrieval-Augmented Generation (RAG) paradigm to fully multimodal document and knowledge retrieval, enabling systems to ingest, parse, represent, and query rich content that includes text, images, tables, formulas, and other structured or visual elements. Traditional RAG systems are typically limited to text and cannot effectively work across heterogeneous document layouts, but RAG-Anything addresses this by modeling multimodal content in ways that preserve cross-modal relationships and semantic context, often treating content elements as interconnected knowledge entities rather than separate data silos. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    libSQL

    libSQL

    libSQL is a fork of SQLite that is both Open Source

    ...SQLite has solidified its place in modern technology stacks, embedded in nearly any computing device you can think of. Its open source nature and public domain availability make it a popular choice for modification to meet specific use cases. libSQL will always be able to ingest and write the SQLite file format. We would love to add extensions like encryption, and CRC that require the file to be changed. But we commit to always doing so in a way that generates standard SQLite files if those features are not used.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Lago

    Lago

    Open Source Metering and Usage Based Billing API

    Lago offers a self-hosted and cloud, scalable and modular architecture for metering and usage-based billing, at every stage of your company. Ingest up to 15,000 billing events per second. Lago’s event-based architecture provides a solid foundation for building a fair pricing model that scales with your business. Lago supports all pricing models. Create pay-as-you-go and hybrid plans in no time with our intuitive user interface or API. Create engaging marketing campaigns and increase conversion with coupons that customers can redeem to get a discount. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    ...Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing conflicts in multimodal models: namely that the visual encoder has to serve both analysis (understanding) and synthesis (generation) roles. By splitting those pathways but keeping one unified core transformer, Janus maintains flexibility and achieves strong performance across tasks previously requiring distinct architectures. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Better Stack JavaScript client

    Better Stack JavaScript client

    Better Stack JavaScript client

    Logtail is now part of Better Stack. Better Uptime and Logtail have been rebranded as Uptime and Logs on the Better Stack platform. Our mission is to help developers in building a better internet. And we believe that with the new fully integrated Better Stack platform, we’re simplifying the experience for the developers leveraging Better Stack even further. Formerly Logtail, lets you visualize your entire stack, turn your logs into structured data, and query everything like a single database...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    trench

    trench

    Open-Source Analytics Infrastructure

    Trench is an open-source analytics infrastructure designed for tracking events and performing real-time analysis of application data at scale. The system is built on top of high-performance data technologies including Apache Kafka and ClickHouse, which allows it to ingest and process very large volumes of events while maintaining fast query performance. It was originally developed to solve scaling challenges in product analytics systems where traditional relational databases become inefficient as event tables grow. The platform enables developers to collect events such as page views, user actions, and behavioral metrics while storing them in a column-oriented analytics database optimized for time-series workloads. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Timesketch

    Timesketch

    Collaborative forensic timeline analysis

    Timesketch is a collaborative forensic timeline analysis platform used to investigate security incidents by turning diverse evidence into a single, searchable chronology. Analysts ingest logs and artifacts from many sources—endpoints, servers, cloud services—and Timesketch normalizes them into events on a unified timeline. Powerful search, aggregations, and saved views help you pivot quickly, highlight anomalies, and preserve investigative steps for later review. The system supports tagging, sketch notes, and story building so teams can annotate findings and share context without losing the raw data trail. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB