Showing 192 open source projects for "benchmark"

View related business solutions
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    CodeGen

    CodeGen

    Open-source model for program synthesis

    ...The project also includes training infrastructure and model checkpoints that allow researchers to experiment with different model sizes and training configurations. Its architecture and training approach enable the models to perform competitively with proprietary coding models on benchmark tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Gemma in PyTorch

    Gemma in PyTorch

    The official PyTorch implementation of Google's Gemma models

    ...The repository demonstrates text generation pipelines, tokenizer setup, quantization paths, and adapters for low-rank or parameter-efficient fine-tuning. Example notebooks walk through instruction tuning and evaluation so teams can benchmark and iterate rapidly. The code is organized to be legible and hackable, exposing attention blocks, positional encodings, and head configurations. With standard PyTorch abstractions, it integrates easily into existing training loops, loggers, and evaluation harnesses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    MemU

    MemU

    MemU is an open-source memory framework for AI companions

    MemU is an agentic memory layer for LLM applications, specifically designed for AI companions. Transform your memory into an intelligent file system that automatically organizes, connects, and evolves with your memories. Simple, fast, and reliable memory infrastructure for AI applications. Powerful tools and dedicated support to scale your AI applications with confidence. Full proprietary features, commercial usage rights, and white-labeling options for your enterprise needs. SSO/RBAC...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    GLM-V

    GLM-V

    GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning

    GLM-V is an open-source vision-language model (VLM) series from ZhipuAI that extends the GLM foundation models into multimodal reasoning and perception. The repository provides both GLM-4.5V and GLM-4.1V models, designed to advance beyond basic perception toward higher-level reasoning, long-context understanding, and agent-based applications. GLM-4.5V builds on the flagship GLM-4.5-Air foundation (106B parameters, 12B active), achieving state-of-the-art results on 42 benchmarks across image,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    TTRL

    TTRL

    Test-Time Reinforcement Learning

    TTRL is an open-source framework for test-time reinforcement learning in large language models, with a particular focus on reasoning tasks where ground-truth labels are not available during inference. The project addresses the problem of how to generate useful reward signals from unlabeled test-time data, and its central insight is that common test-time scaling practices such as majority voting can be repurposed into reward estimates for online reinforcement learning. This makes the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    MiroThinker

    MiroThinker

    MiroThinker is an open source deep research agent

    MiroThinker is an open-source deep research AI agent designed to perform complex reasoning, information gathering, and predictive analysis tasks. The system focuses on enabling long-horizon research workflows by allowing the agent to interact repeatedly with external tools, search systems, and data sources while refining its reasoning through iterative steps. Rather than simply generating responses from a single prompt, the agent performs structured multi-step reasoning processes that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    BrowserGym

    BrowserGym

    A Gym environment for web task automation

    BrowserGym is an open framework for web task automation research that exposes browser interaction as a Gym-style environment for training and evaluating agents. It is intended for researchers building web agents rather than for end users looking for a consumer automation product. The project provides a common environment where agents can interact with websites, execute tasks, and be evaluated against standardized benchmarks. One of its main strengths is that it bundles several important...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    VLMEvalKit

    VLMEvalKit

    Open-source evaluation toolkit of large multi-modality models (LMMs)

    ...The toolkit provides a unified framework that allows researchers and developers to evaluate multimodal models across a wide range of datasets and standardized benchmarks with minimal setup. Instead of requiring complex data preparation pipelines or multiple repositories for each benchmark, the system enables evaluation through simple commands that automatically handle dataset loading, model inference, and metric computation. VLMEvalKit supports generation-based evaluation methods, allowing models to produce textual responses to visual inputs while measuring performance through techniques such as exact matching or language-model-assisted answer extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DFlash

    DFlash

    Block Diffusion for Ultra-Fast Speculative Decoding

    ...This approach has been shown to deliver lossless acceleration on models like Qwen3-8B by combining block diffusion techniques with efficient batching, making it ideal for applications where latency and cost matter. The project includes support for multiple draft models, example integration code, and scripts to benchmark performance, and it is structured to work with popular model serving stacks like SGLang and the Hugging Face Transformers ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    Anthropic's Original Performance

    Anthropic's Original Performance

    Anthropic's original performance take-home, now open for you to try

    Anthropic's Original Performance repository contains the publicly released version of a performance challenge originally used by Anthropic as part of their technical interview process, offering developers the opportunity to optimize and benchmark low-level code against simulated models. The project sets up a baseline performance problem where participants work to reduce simulated “clock cycles” required to run a given workload, effectively challenging them to engineer faster code under constraints. This take-home includes starter code, tests, and tools to debug performance, aiming to measure how effectively one can apply algorithmic improvements and optimizations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    RecBole

    RecBole

    A unified, comprehensive and efficient recommendation library

    A unified, comprehensive and efficient recommendation library. We design general and extensible data structures to unify the formatting and usage of various recommendation datasets. We implement more than 100 commonly used recommendation algorithms and provide formatted copies of 28 recommendation datasets. We support a series of widely adopted evaluation protocols or settings for testing and comparing recommendation algorithms. RecBole is developed based on Python and PyTorch for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    OpenOCR

    OpenOCR

    An Open-Source Toolkit for General-OCR Research and Applications

    OpenOCR is an open-source General OCR toolkit developed by the OCR team at Fudan University for research and real-world document processing applications. It provides a unified platform for text detection, text recognition, formula recognition, table recognition, and document parsing. Built on advanced OCR technologies such as SVTRv2 and UniRec-0.1B, OpenOCR delivers high accuracy while maintaining efficient inference performance. The toolkit supports both Chinese and English content, making...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    TimeMixer

    TimeMixer

    Decomposable Multiscale Mixing for Time Series Forecasting

    TimeMixer is a deep learning framework designed for advanced time series forecasting and analysis using a multiscale neural architecture. The model focuses on decomposing time series data into multiple temporal scales in order to capture both short-term seasonal patterns and long-term trends. Instead of relying on traditional recurrent or transformer-based architectures, TimeMixer is implemented as a fully multilayer perceptron–based model that performs temporal mixing across different...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MiroFlow

    MiroFlow

    Agent framework that enables tool-use agent tasks

    MiroFlow is a high-performance open-source framework designed for building intelligent AI agents capable of solving complex reasoning and research tasks. The system introduces a hierarchical architecture that organizes components into control, agent, and foundation layers, allowing developers to manage agent orchestration and tool interactions in a structured manner. One of the core innovations of MiroFlow is its use of agent graphs, which enable flexible orchestration of multiple sub-agents...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    AIDE ML

    AIDE ML

    AI-Driven Exploration in the Space of Code

    AIDE ML is an open-source research framework designed to explore automated machine learning development through agent-based search and code optimization. The project implements the AIDE algorithm, which uses a tree-search strategy guided by large language models to iteratively generate, evaluate, and refine code. Instead of relying on manual experimentation, the agent autonomously drafts machine learning pipelines, debugs errors, and benchmarks performance against user-defined evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Text-to-LoRA (T2L)

    Text-to-LoRA (T2L)

    Hypernetworks that adapt LLMs for specific benchmark tasks

    Text-to-LoRA is a research project that introduces a method for dynamically adapting large language models using hypernetworks that generate LoRA parameters directly from textual descriptions. Instead of training a new LoRA adapter for every task or dataset, the system can produce task-specific adaptations based solely on a text description of the desired capability. This approach enables models to rapidly internalize new contextual knowledge without performing traditional fine-tuning steps....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DriveLM

    DriveLM

    Driving with Graph Visual Question Answering

    DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    LISA

    LISA

    LISA: Reasoning Segmentation via Large Language Model

    LISA is an open-source multimodal AI system designed to enable language models to perform pixel-level reasoning and segmentation tasks on images. The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Anomaly Detection Learning Resources

    Anomaly Detection Learning Resources

    Anomaly detection related books, papers, videos, and toolboxes

    ...The project serves as a centralized index for researchers and practitioners who want to explore algorithms, datasets, and publications associated with detecting unusual patterns in data. The repository organizes resources into structured categories such as books, tutorials, academic papers, datasets, benchmark frameworks, and open-source toolkits. It includes materials covering a wide range of anomaly detection domains, including time series data, graph data, tabular datasets, and real-time monitoring systems. By compiling resources from multiple programming ecosystems such as Python, R, and other machine learning platforms, the repository allows users to discover both research papers and practical implementations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    ...The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation across base and target domains to measure how well the model retains its general knowledge while specializing as needed. It includes utilities to fine-tune vision-language embeddings, compute prompt or adapter updates, and benchmark across transfer and retention metrics. MetaCLIP is especially suited for real-world settings where a model must continuously incorporate new visual categories or domains over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    OpenCompass

    OpenCompass

    OpenCompass is an LLM evaluation platform

    ...With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models. OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Django Cachalot

    Django Cachalot

    No effort, no worry, maximum performance

    ...You will need a database called "cachalot" on MySQL and PostgreSQL. Additionally, on PostgreSQL, you will need to create a role called "cachalot." You can also run the benchmark, and it'll raise errors with specific instructions for how to fix it. Use cachalot for cold or modified <50 times per minutes (Most people should stick with only cachalot since you most likely won't need to scale to the point of needing cache-machine added to the bowl).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PyG

    PyG

    Graph Neural Network Library for PyTorch

    ...In addition, it consists of easy-to-use mini-batch loaders for operating on many small and single giant graphs, multi GPU-support, DataPipe support, distributed graph learning via Quiver, a large number of common benchmark datasets (based on simple interfaces to create your own), the GraphGym experiment manager, and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds. All it takes is 10-20 lines of code to get started with training a GNN model (see the next section for a quick tour).
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo