Data Infrastructure providing an approach to multimodal AI workloads
Multilingual sentence & image embeddings with BERT
MemU is an open-source memory framework for AI companions
A text-to-speech, speech-to-text and speech-to-speech library
Minimal CLI coding agent by Mistral
Haystack is an open source NLP framework to interact with your data
Python library and CLI tool to interface with Google Translate
Multimodal embedding and reranking models built on Qwen3-VL
Ready-to-use OCR with 80+ supported languages
Cloud-native open source data warehouse for analytics and AI queries
SQL-Driven RAG Engine
Dealing with all unstructured data, such as reverse image search
CLIP, Predict the most relevant text snippet given an image
The data structure for multimodal data
Generate audiobooks from e-books, voice cloning & 1107+ languages
Open Source Document Management System for Digital Archives
local-first semantic code search engine
Reading book source
Context database designed specifically for AI Agents
Python binding to the Apache Tika™ REST services
Local RAG engine for private multimodal knowledge search on devices
An advanced paper search agent powered by large language models
Open-source choice to scale, assess and maintain natural language data
The open-source data curation platform for LLMs
The ultimate RAG for your monorepo