Showing 343 open source projects for "index data"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    Anomaly Detection Learning Resources

    Anomaly Detection Learning Resources

    Anomaly detection related books, papers, videos, and toolboxes

    Anomaly Detection Learning Resources is a curated open-source repository that collects educational materials, tools, and academic references related to anomaly detection and outlier analysis in data science. The project serves as a centralized index for researchers and practitioners who want to explore algorithms, datasets, and publications associated with detecting unusual patterns in data. The repository organizes resources into structured categories such as books, tutorials, academic papers, datasets, benchmark frameworks, and open-source toolkits. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    nb-clean

    nb-clean

    Clean Jupyter notebooks of outputs, metadata, and empty cells

    nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs, and (optionally) empty cells, preparing them for committing to version control. It provides both a Git filter and pre-commit hook to automatically clean notebooks before they're staged, and can also be used with other version control systems, as a command line tool, and as a Python library. It can determine if a notebook is clean or not, which can be used as a check in your continuous integration pipelines....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Minimal Mistakes Jekyll theme

    Minimal Mistakes Jekyll theme

    Jekyll theme for building a personal site, blog, project documentation

    A flexible two-column Jekyll theme. Perfect for building personal sites, blogs, and portfolios. Everything from the menus, sidebars, comments, and more can be configured or set with YAML Front Matter. Built with HTML5 + CSS3. All layouts are fully responsive with helpers to augment your content. Free to use however you want under the MIT License. Clone it, fork it, customize it, etc. Settings that affect your entire site can be changed in Jekyll’s configuration file: _config.yml, found in...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Vearch

    Vearch

    A distributed system for embedding-based vector retrieval

    Vearch is the vector search infrastructure for deep learning and AI applications. Vearch is a distributed vector storage and retrieval system which can be easily extended to billions scale. Vearch implements a high-performance, lockless real-time vector indexing subsystem that utilizes various optimization techniques to support millisecond vector update and retrieval. End-to-end one-click deployment. Through the module of the plugin, a complete default visual search system can be deployed...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 5
    Searchkick

    Searchkick

    Intelligent search made easy

    Searchkick brings powerful, production-ready search to Rails by mapping Active Record models into Elasticsearch with sensible defaults and easy customization. It supports language analyzers, stemming, synonyms, misspelling tolerance, and highlighting so search results feel natural to end users. Indexing is model-centric: you declare what fields to index, add computed fields, and trigger reindexing via callbacks or background jobs, with options for zero-downtime rolling reindexes. On the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Discord.SortedSet

    Discord.SortedSet

    Elixir SortedSet backed by a Rust-based NIF

    SortedSet NIF is a performant and reliable sorted set data structure for Elixir, implemented in Rust using the Rustler crate to take advantage of native performance while maintaining seamless integration with the BEAM ecosystem. It provides ordering and uniqueness guarantees, with all terms stored according to Elixir’s built-in sorting rules. Internally, it uses a vector of vectors layout rather than a single vector to minimize costly reallocations, allowing efficient bucket pointer copying...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OpenArchiver

    OpenArchiver

    An open-source platform for legally compliant email archiving

    OpenArchiver is a comprehensive, self-hosted email archiving and compliance platform built to help organizations ingest, index, store, and search email communication data across diverse sources like Gmail, Microsoft 365, IMAP, PST, and more. It’s designed for scenarios where reliable, tamper-proof archiving and full-text search across both emails and attachments are essential for legal discovery, compliance, or long-term records retention. The platform combines a modern web UI with powerful backend services, including fast indexing, deduplication, encryption at rest, and asynchronous ingestion workflows, making it suitable for both small teams and enterprise deployments. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8

    Pix

    Image management application

    Pix is an image management application with image viewing, browsing, organizing and editing capabilities. It is part of the X-Apps project, which aims at producing cross-distribution and cross-desktop software. Pix supports numerous image types including: BMP, JPEG, GIF, PNG, TIFF, TGA, ICO and XPM; with optional support for RAW and HDR (high dynamic range) images. It is also able to view EXIF data attached to JPEG images. Pix has its own set of image editing tools that enable you to...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 9
    OpenRecall

    OpenRecall

    OpenRecall is a fully open-source, privacy-first alternative

    ...This data is then indexed into a searchable database, allowing users to retrieve past information quickly using natural language queries. Unlike proprietary alternatives, OpenRecall operates entirely locally, ensuring that all captured data remains on the user’s device and is never transmitted to external servers. The platform supports multiple operating systems, including Windows, macOS, and Linux, making it widely accessible across different environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Quickwit

    Quickwit

    Sub-second search & analytics engine on cloud storage

    ...However, we only support ES aggregation DSL, the query DSL support is planned for Q2 2023. The core difference and advantage of Quickwit are its architecture built from the ground to search on cloud storage. We optimized IO paths, revamped the index data structures and made search stateless and sub-second on cloud storage. Quickwit is open-source under the GNU Affero General Public License Version 3 - AGPLv3. Fundamentally, this means you are free to use Quickwit for your project if you don't modify Quickwit. However, if you do and you are distributing your modified version to the public, you have to make the modifications public.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    marqo

    marqo

    Tensor search for humans

    A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Scala 2

    Scala 2

    Scala 2 compiler and standard library

    ...You can use any version of Scala, or even alternate backends such as Dotty, Scala.js, Scala Native, and Typelevel Scala. You can use any published library. You can save and share Scala programs/builds with anybody. The Scala Library Index (or Scaladex) is a representation of a map of all published Scala libraries. With Scaladex, a developer can now query more than 175,000 releases of Scala libraries. Scaladex is officially supported by Scala Center. In Scala, functions are values, and can be defined as anonymous functions with a concise syntax. In Scala, case classes are used to represent structural data types.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    Kernel Memory

    Kernel Memory

    Research project. A Memory solution for users, teams, and applications

    ...Applications can then query these indexed data sources to retrieve relevant information and include it as context for AI responses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    TONL

    TONL

    TONL (Token-Optimized Notation Language)

    TONL is a cutting-edge data platform built around a production-ready serialization format designed to be both compact and powerful, combining human readability with performance features that make it suitable for large-scale applications and AI workflows. It provides a serialization format that significantly reduces token usage compared with traditional JSON, which can result in lower costs and more efficient prompt size utilization in LLM-driven systems. TONL isn’t just a format — it...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Engram

    Engram

    A New Axis of Sparsity for Large Language Models

    Engram is a high-performance embedding and similarity search library focused on making retrieval-augmented workflows efficient, scalable, and easy to adopt by developers building search, recommendation, or semantic matching systems. It provides utilities to generate embeddings from text or other structured data, index them using efficient approximate nearest neighbor algorithms, and perform real-time similarity queries even on large corpora. Engineered with speed and memory efficiency in mind, Engram supports batched indexing, incremental updates, and custom distance metrics so developers can tailor search behaviors to their domain’s needs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Reactive Search

    Reactive Search

    Search UI components for React and Vue

    ...Bring them to ReactiveSearch. Styled components with rich theming and CSS class-injection support. Reactivesearch components can be ported to create native mobile UIs. Connect to an ES index hosted anywhere. Supports v2, v5 and v6. Components come with good query defaults, that can be customized with Elasticsearch query DSL. Go from scratch to creating a data-driven search app with our beginner-friendly quick start guide. We offer production support for ReactiveSearch. Work with us to bring your dream project to life.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Bootstrap Your Own Latent (BYOL)

    Bootstrap Your Own Latent (BYOL)

    Usable Implementation of "Bootstrap Your Own Latent" self-supervised

    ...This repository offers a module that one can easily wrap any image-based neural network (residual network, discriminator, policy network) to immediately start benefitting from unlabelled image data. There is now new evidence that batch normalization is key to making this technique work well. A new paper has successfully replaced batch norm with group norm + weight standardization, refuting that batch statistics are needed for BYOL to work. Simply plugin your neural network, specifying (1) the image dimensions as well as (2) the name (or index) of the hidden layer, whose output is used as the latent representation used for self-supervised training.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    type-fest

    type-fest

    A collection of essential TypeScript types

    type-fest is a TypeScript utility types library that offers a curated, battle-tested suite of type definitions and type transformations that aren’t included in the TypeScript standard library. It provides types like Except, Merge, LiteralUnion, Writable, Promisable, PartialDeep, JsonObject, and many others that solve everyday typing needs in complex TypeScript codebases. Developers pull in just the types they need, which makes code more expressive and safer without reinventing tricky type...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    clip-retrieval

    clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system

    clip-retrieval is an open-source toolkit designed to build large-scale semantic search systems for images and text by leveraging CLIP embeddings to enable multimodal retrieval. It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. It includes...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    DeepSearcher

    DeepSearcher

    Open Source Deep Research Alternative to Reason and Search

    DeepSearcher is an open-source “deep research” style system that combines retrieval with evaluation and reasoning to answer complex questions using private or enterprise data. It is designed around the idea that high-quality answers require more than top-k retrieval, so it orchestrates multi-step search, evidence collection, and synthesis into a comprehensive response. The project integrates with vector databases (including Milvus and related options) so organizations can index internal documents and query them with semantic retrieval. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    elasticsearc-php

    elasticsearc-php

    PHP low-level client for Elasticsearch

    Introducing Elasticsearch DSL library to provide objective query builder for Elasticsearch bundle and elasticsearch-php client. You can easily build any Elasticsearch query and transform it to an array. This agnostic package is a lightweight wrapper on top of the Elasticsearch PHP client. Its main goal is to allow for easier structuring of queries and indices in your application. It does not want to hide or replace the functionality of the Elasticsearch PHP client. Feature complete, object...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    CS-Books

    CS-Books

    Collection of computer science textbooks, learning materials

    CS-Books is a massive curated collection of computer science textbooks, learning materials, and resource links that covers a wide range of topics from programming languages like C/C++ and Python to core subjects such as data structures, algorithms, operating systems, databases, networks, and design patterns. The repository aggregates over a thousand classic reference books and educational resources into a single index, making it a valuable starting point for self-learners, students preparing for technical interviews, and professionals deepening their knowledge across different CS domains. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    The Hypersim Dataset

    The Hypersim Dataset

    Photorealistic Synthetic Dataset for Holistic Indoor Scene

    ...The dataset spans diverse furniture layouts, room types, and camera trajectories, enabling robust training for geometry, segmentation, and SLAM-adjacent tasks. Rendering pipelines and utilities allow researchers to reproduce sequences, generate novel views, or extract task-specific supervision. Because the data are perfectly labeled and controllable, Hypersim is well suited for pretraining and for studying domain transfer to real imagery. The repository acts as both a dataset index and a set of scripts for downloading, managing, and evaluating on standardized splits.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Create Index from PDF

    PDF Indexing Script: Searches PDF for words, records page numbers

    This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly. It is designed to be case-insensitive, ensuring that variations in capitalization do not affect the search results. As it processes the PDF, the...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB