ingest free download - SourceForge

Showing 62 open source projects for "ingest"

View related business solutions

Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

Calibre-Web Automated

Calibre-Web but Automated and with tons of New Features

...CWA keeps the familiar strengths of Calibre-Web, like a responsive interface, OPDS feeds for e-readers, and strong user/permission management, while layering in extensive automation features that handle ingest, conversion, metadata enforcement, and library organization with less manual effort. It also expands authentication options (including modern OAuth/OIDC flows) and adds quality-of-life improvements such as enhanced search/filtering, multi-language UI support, and better device-centric workflows for sending to e-readers.

Downloads: 4 This Week

Last Update: 2026-02-05
See Project
2

TypeAgent Python

Structured RAG: ingest, index, query

TypeAgent Python is an experimental Python implementation of Microsoft’s TypeAgent architecture designed to explore how large language models can interact with structured software systems. The project focuses on implementing structured Retrieval-Augmented Generation workflows that allow agents to ingest information, index it in structured form, and answer queries using language models. Instead of relying solely on free-form prompts, the architecture emphasizes converting natural language interactions into structured representations that can be processed by deterministic software components. This design allows the system to combine the flexibility of language models with the reliability of traditional programming logic. ...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
3

Prometheus SNMP Exporter

SNMP Exporter for Prometheus

This exporter is the recommended way to expose SNMP data in a format that Prometheus can ingest. To simply get started, it's recommended to use the if_mib module with switches, access points, or routers using the public_v2 auth module, which should be a read-only access community on the target device. Note, that community strings in SNMP are not considered secrets, as they are sent unencrypted in SNMP v1 and v2c. For secure access, SNMP v3 is required.

Downloads: 5 This Week

Last Update: 2026-01-06
See Project
4

Eliza

Autonomous agents for everyone

Build and deploy autonomous AI agents with consistent personalities across Discord, Twitter, and Telegram. Full support for voice, text, and media interactions. Built-in RAG memory system, document processing, media analysis, and autonomous trading capabilities. Supports multiple AI models including Llama, GPT-4, and Claude. Create custom actions, add new platform integrations, and extend functionality through a modular plugin system. Full TypeScript support.

Downloads: 12 This Week

Last Update: 20 hours ago
See Project
Add Two Lines of Code. Get Full APM.
AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.

Start Free
5

TimescaleDB

An open-source time-series SQL database optimized for fast ingest

TimescaleDB is the open-source relational database for time-series and analytics. Build powerful data-intensive applications. Become instantly productive with full SQL. Rely on the same PostgreSQL you know, love, and trust. Hyperfunctions make time series easier. Achieve 10-100x faster queries than with vanilla PostgreSQL, InfluxDB, MongoDB. Write millions of data points per second per node. Horizontally scale to petabytes. Don’t worry about cardinality. Simplify your stack, ask more complex...

Downloads: 40 This Week

Last Update: 4 days ago
See Project
6

go2rtc

Ultimate camera streaming application

...Written in Go, it provides real-time streaming capabilities with extremely low latency by supporting protocols such as RTSP, WebRTC, RTMP, HTTP, and HomeKit, while also enabling seamless transcoding using FFmpeg when needed. The application can ingest streams from IP cameras, USB devices, or cloud-based sources and redistribute them to multiple clients or platforms, including browsers and smart home systems like Home Assistant. Its architecture emphasizes flexibility, allowing users to mix multiple input sources, negotiate codecs dynamically, and even enable two-way audio communication with supported devices. go2rtc also includes features for publishing streams to external platforms like YouTube or Telegram, making it useful beyond surveillance scenarios.

Downloads: 11 This Week

Last Update: 2026-04-24
See Project
7

IPFS Cluster

Pinset orchestration for IPFS

IPFS Cluster provides data orchestration across a swarm of IPFS daemons by allocating, replicating and tracking a global pinset distributed among multiple peers. IPFS has given the users the power of content-addressed storage. The permanent web requires, however, a data redundancy and availability solution that does not compromise on the distributed nature of the IPFS Network. IPFS Cluster is a distributed application that works as a sidecar to IPFS peers, maintaining a global cluster pinset...

Downloads: 0 This Week

Last Update: 2025-12-17
See Project
8

Quickwit

Sub-second search & analytics engine on cloud storage

Sub-second search & analytics engine on cloud storage. Quickwit is the fastest search engine on cloud storage. Quickwit has an Elasticsearch-compatible Ingest-API to make it easier to migrate your log shippers (Vector, Fluent Bit, Syslog, ...) to Quickwit. However, we only support ES aggregation DSL, the query DSL support is planned for Q2 2023. The core difference and advantage of Quickwit are its architecture built from the ground to search on cloud storage. We optimized IO paths, revamped the index data structures and made search stateless and sub-second on cloud storage. ...

Downloads: 1 This Week

Last Update: 2026-04-21
See Project
9

Kernel Memory

Research project. A Memory solution for users, teams, and applications

...It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. Kernel Memory can ingest documents in multiple formats, process them into embeddings, and store them in searchable indexes. Applications can then query these indexed data sources to retrieve relevant information and include it as context for AI responses.

Downloads: 2 This Week

Last Update: 2026-03-06
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

Arize Phoenix

Uncover insights, surface problems, monitor, and fine tune your LLM

Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. A large set of these technologies are being deployed into businesses (the real world) in what we consider a production setting.

Downloads: 2 This Week

Last Update: 9 hours ago
See Project
11

Easy DataSet

A powerful tool for creating datasets for LLM fine-tuning

Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure...

Downloads: 4 This Week

Last Update: 2026-04-10
See Project
12

Open Semantic Search

Open source semantic search and text analytics for large document sets

...It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.

Downloads: 5 This Week

Last Update: 5 days ago
See Project
13

OpenArchiver

An open-source platform for legally compliant email archiving

OpenArchiver is a comprehensive, self-hosted email archiving and compliance platform built to help organizations ingest, index, store, and search email communication data across diverse sources like Gmail, Microsoft 365, IMAP, PST, and more. It’s designed for scenarios where reliable, tamper-proof archiving and full-text search across both emails and attachments are essential for legal discovery, compliance, or long-term records retention. The platform combines a modern web UI with powerful backend services, including fast indexing, deduplication, encryption at rest, and asynchronous ingestion workflows, making it suitable for both small teams and enterprise deployments. ...

Downloads: 1 This Week

Last Update: 2026-03-20
See Project
14

Supermemory

Memory engine and app that is extremely fast, scalable

Supermemory is an ambitious and extensible AI-powered personal knowledge management system that aims to help users capture, organize, retrieve, and reason over information in a manner that mimics human memory structures. The platform allows individuals to ingest text, documents, and other content forms, then uses advanced retrieval and embedding techniques to index and relate information intelligently so that users can recall relevant knowledge in context rather than just by keyword match. It often incorporates clustering, semantic search, and summarization modules to reduce cognitive load and surface key ideas, which makes it useful for research, study, writing, and long-term project tracking. ...

Downloads: 0 This Week

Last Update: 3 days ago
See Project
15

mcp-server-chatsum

Query and Summarize your chat messages

mcp-server-chatsum is an MCP server that indexes your chat history and provides tools to query and produce focused summaries on demand. It offers a simple flow: point the server at a local chat database, run the companion chatbot to ingest messages, and then use the MCP tool to retrieve scoped threads and generate concise syntheses. The tool design lets agents filter by participants, time ranges, or keywords before summarizing, which keeps outputs relevant and reduces hallucinated context. Documentation provides a “before you start” checklist to initialize the dataset and highlights the single tool (query_chat_messages) that returns results and summaries. ...

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
16

OpenObserve

Elasticsearch/Splunk/Datadog alternative for (logs, metrics, traces)

OpenObserve is a cloud-native observability platform built specifically for logs, metrics, traces, and analytics designed to work at a petabyte scale. It is very simple and easy to operate as opposed to Elasticsearch which requires a couple of dozen knobs to understand and tune which you can get up and running in under 2 minutes. It is a drop-in replacement for Elasticsearch if you are just ingesting data using APIs and searching using Kibana (Kibana is not supported nor required with...

Downloads: 1 This Week

Last Update: 5 days ago
See Project
17

AI PDF Chatbot LangChain

AI PDF chatbot agent built with LangChain & LangGraph

AI PDF Chatbot LangChain is a full-stack template for building conversational agents that can ingest and answer questions about PDF documents. The project demonstrates how to combine LangChain and LangGraph with a vector database to enable retrieval-augmented question answering over user-provided files. It includes both frontend and backend components, making it suitable as a production starting point rather than just a minimal demo.

Downloads: 0 This Week

Last Update: 2026-03-27
See Project
18

RAG Anything

RAG-Anything: All-in-One RAG Framework

RAG-Anything is an open-source unified framework that extends the Retrieval-Augmented Generation (RAG) paradigm to fully multimodal document and knowledge retrieval, enabling systems to ingest, parse, represent, and query rich content that includes text, images, tables, formulas, and other structured or visual elements. Traditional RAG systems are typically limited to text and cannot effectively work across heterogeneous document layouts, but RAG-Anything addresses this by modeling multimodal content in ways that preserve cross-modal relationships and semantic context, often treating content elements as interconnected knowledge entities rather than separate data silos. ...

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
19

libSQL

libSQL is a fork of SQLite that is both Open Source

...SQLite has solidified its place in modern technology stacks, embedded in nearly any computing device you can think of. Its open source nature and public domain availability make it a popular choice for modification to meet specific use cases. libSQL will always be able to ingest and write the SQLite file format. We would love to add extensions like encryption, and CRC that require the file to be changed. But we commit to always doing so in a way that generates standard SQLite files if those features are not used.

Downloads: 0 This Week

Last Update: 2025-08-13
See Project
20

Lago

Open Source Metering and Usage Based Billing API

Lago offers a self-hosted and cloud, scalable and modular architecture for metering and usage-based billing, at every stage of your company. Ingest up to 15,000 billing events per second. Lago’s event-based architecture provides a solid foundation for building a fair pricing model that scales with your business. Lago supports all pricing models. Create pay-as-you-go and hybrid plans in no time with our intuitive user interface or API. Create engaging marketing campaigns and increase conversion with coupons that customers can redeem to get a discount. ...

Downloads: 0 This Week

Last Update: 2026-04-07
See Project
21

trench

Open-Source Analytics Infrastructure

Trench is an open-source analytics infrastructure designed for tracking events and performing real-time analysis of application data at scale. The system is built on top of high-performance data technologies including Apache Kafka and ClickHouse, which allows it to ingest and process very large volumes of events while maintaining fast query performance. It was originally developed to solve scaling challenges in product analytics systems where traditional relational databases become inefficient as event tables grow. The platform enables developers to collect events such as page views, user actions, and behavioral metrics while storing them in a column-oriented analytics database optimized for time-series workloads. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
22

HunyuanWorld-Mirror

Fast and Universal 3D reconstruction model for versatile tasks

HunyuanWorld-Mirror focuses on fast, universal 3D reconstruction that can ingest varied inputs and produce multiple kinds of 3D outputs. The model accepts combinations of images, camera intrinsics and poses, or even depth cues, then reconstructs consistent 3D geometry suitable for downstream rendering or editing. The pipeline emphasizes both speed and flexibility so creators can go from casual captures to assets without elaborate capture rigs.

Downloads: 0 This Week

Last Update: 2026-04-15
See Project
23

Timesketch

Collaborative forensic timeline analysis

Timesketch is a collaborative forensic timeline analysis platform used to investigate security incidents by turning diverse evidence into a single, searchable chronology. Analysts ingest logs and artifacts from many sources—endpoints, servers, cloud services—and Timesketch normalizes them into events on a unified timeline. Powerful search, aggregations, and saved views help you pivot quickly, highlight anomalies, and preserve investigative steps for later review. The system supports tagging, sketch notes, and story building so teams can annotate findings and share context without losing the raw data trail. ...

Downloads: 0 This Week

Last Update: 2026-03-26
See Project
24

Synthetic Data Kit

Tool for generating high quality Synthetic datasets

Synthetic Data Kit is a CLI-centric toolkit for generating high-quality synthetic datasets to fine-tune Llama models, with an emphasis on producing reasoning traces and QA pairs that line up with modern instruction-tuning formats. It ships an opinionated, modular workflow that covers ingesting heterogeneous sources (documents, transcripts), prompting models to create labeled examples, and exporting to fine-tuning schemas with minimal glue code. The kit’s design goal is to shorten the “data...

Downloads: 0 This Week

Last Update: 2025-10-25
See Project
25

Metarank

A low code Machine Learning service that personalizes articles

...It’s often considered "too risky" to spend 6+ months on an in-house moonshot project to reinvent the wheel without an experienced team and no existing open-source tools. Metarank makes it easy not only for Amazon to do personalization but for everyone else. Ingest historical item listings, clicks and item metadata so Metarank can find hidden dependencies in the data using our simple JSON format.No Machine Learning experience is required, run our CLI tool with a set of features in a YAML configuration. Run Metarank API service, feed it with real-time events and receive a personalized ranking for your items that will boost conversion, click-through rate or any other business-critical metric you define.

Downloads: 0 This Week

Last Update: 2025-06-24
See Project