Semantic Search Tools for BSD

  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 1
    MyScaleDB

    MyScaleDB

    A @ClickHouse fork that supports high-performance vector search

    MyScaleDB is an open-source SQL vector database designed for building large-scale AI and machine learning applications that require both analytical queries and semantic vector search. The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. MyScaleDB enables developers to perform vector similarity searches using standard SQL syntax, eliminating the need to learn specialized vector database query languages. The database is optimized for high performance and scalability, allowing it to handle extremely large datasets and high query loads typical of production AI applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Paul Graham GPT

    Paul Graham GPT

    RAG on Paul Graham's essays

    Paul Graham GPT is a specialized AI-powered search and chat app built on a corpus of essays from Paul Graham, giving users the ability to query and discuss his writings in a conversational way. The repo stores the full text of his essays (chunked), uses embeddings (e.g. via OpenAI embeddings) to allow semantic search over that corpus, and hosts a chat interface that combines retrieval results with LLM-based answering — enabling RAG (retrieval-augmented generation) over a fixed dataset. The app uses a Postgres database (with pgvector) hosted on Supabase for its embedding store, making the backend relatively simple and accessible, and the frontend is again built with Next.js/TypeScript for a modern responsive UI. By pulling together search and chat, it creates a useful tool both for readers who want to revisit or explore Paul Graham’s ideas thematically, and for learners or researchers who want to query specific essays or concepts quickly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a table-based abstraction. Developers define data transformations and AI operations using computed columns on tables, allowing pipelines to evolve incrementally as new data or models are added. The framework supports multimodal content including images, video, text, and audio, enabling applications such as retrieval-augmented generation systems, semantic search, and multimedia analytics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    This project provides cross-forge semantic search for the Qualipso Forge. It integrates A4 AdvDoc prototype (semantic search GUI and engine) with A3 homogeneous and heterogeneous cross-forge semantic search capabilities. See Qualipso.org for details
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    RAG from Scratch

    RAG from Scratch

    Demystify RAG by building it from scratch

    RAG From Scratch is an educational open-source project designed to teach developers how retrieval-augmented generation systems work by building them step by step. Instead of relying on complex frameworks or cloud services, the repository demonstrates the entire RAG pipeline using transparent and minimal implementations. The project walks through key concepts such as generating embeddings, building vector databases, retrieving relevant documents, and integrating the retrieved context into language model prompts. Each example is written with detailed explanations so that developers can understand the internal mechanics of semantic search and context-aware language generation. The repository emphasizes learning through direct implementation, allowing users to see how each component of the RAG architecture functions independently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    SimpleMem

    SimpleMem

    SimpleMem: Efficient Lifelong Memory for LLM Agents

    SimpleMem is a lightweight memory-augmented model framework that helps developers build AI applications that retain long-term context and recall relevant information without overloading model context windows. It provides easy-to-use APIs for storing structured memory entries, querying those memories using semantic search, and retrieving context to augment prompt inputs for downstream processing. Unlike monolithic systems where memory management is ad-hoc, SimpleMem formalizes a memory lifecycle—write, index, retrieve, refine—so applications can handle user history, document collections, or dynamic contextual state systematically. It supports customizable embedding models, efficient vector indexes, and relevance weighting, making it practical for building assistants, personal agents, or domain-specific retrieval systems that need persistent knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    bge-base-en-v1.5

    bge-base-en-v1.5

    Efficient English embedding model for semantic search and retrieval

    bge-base-en-v1.5 is an English sentence embedding model from BAAI optimized for dense retrieval tasks, part of the BGE (BAAI General Embedding) family. It is a fine-tuned BERT-based model designed to produce high-quality, semantically meaningful embeddings for tasks like semantic similarity, information retrieval, classification, and clustering. This version (v1.5) improves retrieval performance and stabilizes similarity score distribution without requiring instruction-based prompts. With 768 embedding dimensions and a maximum sequence length of 512 tokens, it achieves strong performance across multiple MTEB benchmarks, nearly matching larger models while maintaining efficiency. It supports use via SentenceTransformers, Hugging Face Transformers, FlagEmbedding, and ONNX for various deployment scenarios. Typical usage includes normalizing output embeddings and calculating cosine similarity via dot product for ranking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    bge-large-en-v1.5

    bge-large-en-v1.5

    BGE-Large v1.5: High-accuracy English embedding model for retrieval

    BAAI/bge-large-en-v1.5 is a powerful English sentence embedding model designed by the Beijing Academy of Artificial Intelligence to enhance retrieval-augmented language model systems. It uses a BERT-based architecture fine-tuned to produce high-quality dense vector representations optimized for sentence similarity, search, and retrieval. This model is part of the BGE (BAAI General Embedding) family and delivers improved similarity distribution and state-of-the-art results on the MTEB benchmark. It is recommended for use in document retrieval tasks, semantic search, and passage reranking, particularly when paired with a reranker like BGE-Reranker. The model supports inference through multiple frameworks, including FlagEmbedding, Sentence-Transformers, LangChain, and Hugging Face Transformers. It accepts English text as input and returns normalized 1024-dimensional embeddings suitable for cosine similarity comparisons.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    bge-small-en-v1.5

    bge-small-en-v1.5

    Compact English sentence embedding model for semantic search tasks

    BAAI/bge-small-en-v1.5 is a lightweight English sentence embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI) as part of the BGE (BAAI General Embedding) series. Designed for dense retrieval, semantic search, and similarity tasks, it produces 384-dimensional embeddings that can be used to compare and rank sentences or passages. This version (v1.5) improves similarity distribution, enhancing performance without the need for special query instructions. The model is optimized for speed and efficiency, making it suitable for resource-constrained environments. It is compatible with popular libraries such as FlagEmbedding, Sentence-Transformers, and Hugging Face Transformers. The model achieves competitive results on the MTEB benchmark, especially in retrieval and classification tasks. With only 33.4M parameters, it provides a strong balance of accuracy and performance for English-only use cases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    eagle-i
    eagle-i is an ontology-driven, RDF-based distributed platform for creating, storing and searching semantically rich data. eagle-i is built around semantic web technologies and adheres to linked open data principles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    hora

    hora

    Efficient approximate nearest neighbor search algorithm collections

    hora is an open-source high-performance vector similarity search library designed for large-scale machine learning and information retrieval systems. The project focuses on approximate nearest neighbor search, a fundamental technique used in modern AI applications such as recommendation systems, image search, and semantic search engines. Hora implements multiple efficient indexing algorithms that allow systems to rapidly search through high-dimensional vectors produced by machine learning models. These vectors are commonly generated by neural networks to represent images, text, audio, or other data types in a mathematical embedding space. The library is written in Rust and emphasizes performance, safety, and efficient memory management, making it suitable for production-grade applications requiring low latency and high throughput.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. This allows the generated graphs to be denser, more coherent, and easier to use for downstream tasks such as retrieval-augmented generation, semantic search, and reasoning systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    mgrep

    mgrep

    A calm, CLI-native way to semantically grep everything, like code

    This project is a modern, semantic search tool that brings the simplicity of traditional command-line grep to the world of natural language and multimodal content, enabling users to search across codebases, documents, PDFs, and even images using meaning-aware queries. Built with a focus on calm CLI experiences, it lets you index and query your local files with semantic understanding, delivering results that are relevant to your intent rather than simple pattern matches, which is especially powerful in large or diverse projects. It also includes features such as background indexing to keep your search index up to date without interrupting your workflow and web search integration to expand the scope of queries beyond local files. Designed for both programmers and agents, it integrates naturally into development and research workflows while offering thoughtful defaults that keep output clean and informative.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB