Showing 125 open source projects for "benchmark"

View related business solutions
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    VLMEvalKit

    VLMEvalKit

    Open-source evaluation toolkit of large multi-modality models (LMMs)

    ...The toolkit provides a unified framework that allows researchers and developers to evaluate multimodal models across a wide range of datasets and standardized benchmarks with minimal setup. Instead of requiring complex data preparation pipelines or multiple repositories for each benchmark, the system enables evaluation through simple commands that automatically handle dataset loading, model inference, and metric computation. VLMEvalKit supports generation-based evaluation methods, allowing models to produce textual responses to visual inputs while measuring performance through techniques such as exact matching or language-model-assisted answer extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    DFlash

    DFlash

    Block Diffusion for Ultra-Fast Speculative Decoding

    ...This approach has been shown to deliver lossless acceleration on models like Qwen3-8B by combining block diffusion techniques with efficient batching, making it ideal for applications where latency and cost matter. The project includes support for multiple draft models, example integration code, and scripts to benchmark performance, and it is structured to work with popular model serving stacks like SGLang and the Hugging Face Transformers ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Anthropic's Original Performance

    Anthropic's Original Performance

    Anthropic's original performance take-home, now open for you to try

    Anthropic's Original Performance repository contains the publicly released version of a performance challenge originally used by Anthropic as part of their technical interview process, offering developers the opportunity to optimize and benchmark low-level code against simulated models. The project sets up a baseline performance problem where participants work to reduce simulated “clock cycles” required to run a given workload, effectively challenging them to engineer faster code under constraints. This take-home includes starter code, tests, and tools to debug performance, aiming to measure how effectively one can apply algorithmic improvements and optimizations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    RecBole

    RecBole

    A unified, comprehensive and efficient recommendation library

    A unified, comprehensive and efficient recommendation library. We design general and extensible data structures to unify the formatting and usage of various recommendation datasets. We implement more than 100 commonly used recommendation algorithms and provide formatted copies of 28 recommendation datasets. We support a series of widely adopted evaluation protocols or settings for testing and comparing recommendation algorithms. RecBole is developed based on Python and PyTorch for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    OpenOCR

    OpenOCR

    An Open-Source Toolkit for General-OCR Research and Applications

    OpenOCR is an open-source General OCR toolkit developed by the OCR team at Fudan University for research and real-world document processing applications. It provides a unified platform for text detection, text recognition, formula recognition, table recognition, and document parsing. Built on advanced OCR technologies such as SVTRv2 and UniRec-0.1B, OpenOCR delivers high accuracy while maintaining efficient inference performance. The toolkit supports both Chinese and English content, making...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    TimeMixer

    TimeMixer

    Decomposable Multiscale Mixing for Time Series Forecasting

    TimeMixer is a deep learning framework designed for advanced time series forecasting and analysis using a multiscale neural architecture. The model focuses on decomposing time series data into multiple temporal scales in order to capture both short-term seasonal patterns and long-term trends. Instead of relying on traditional recurrent or transformer-based architectures, TimeMixer is implemented as a fully multilayer perceptron–based model that performs temporal mixing across different...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MiroFlow

    MiroFlow

    Agent framework that enables tool-use agent tasks

    MiroFlow is a high-performance open-source framework designed for building intelligent AI agents capable of solving complex reasoning and research tasks. The system introduces a hierarchical architecture that organizes components into control, agent, and foundation layers, allowing developers to manage agent orchestration and tool interactions in a structured manner. One of the core innovations of MiroFlow is its use of agent graphs, which enable flexible orchestration of multiple sub-agents...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    AIDE ML

    AIDE ML

    AI-Driven Exploration in the Space of Code

    AIDE ML is an open-source research framework designed to explore automated machine learning development through agent-based search and code optimization. The project implements the AIDE algorithm, which uses a tree-search strategy guided by large language models to iteratively generate, evaluate, and refine code. Instead of relying on manual experimentation, the agent autonomously drafts machine learning pipelines, debugs errors, and benchmarks performance against user-defined evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Text-to-LoRA (T2L)

    Text-to-LoRA (T2L)

    Hypernetworks that adapt LLMs for specific benchmark tasks

    Text-to-LoRA is a research project that introduces a method for dynamically adapting large language models using hypernetworks that generate LoRA parameters directly from textual descriptions. Instead of training a new LoRA adapter for every task or dataset, the system can produce task-specific adaptations based solely on a text description of the desired capability. This approach enables models to rapidly internalize new contextual knowledge without performing traditional fine-tuning steps....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DriveLM

    DriveLM

    Driving with Graph Visual Question Answering

    DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    LISA

    LISA

    LISA: Reasoning Segmentation via Large Language Model

    LISA is an open-source multimodal AI system designed to enable language models to perform pixel-level reasoning and segmentation tasks on images. The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Anomaly Detection Learning Resources

    Anomaly Detection Learning Resources

    Anomaly detection related books, papers, videos, and toolboxes

    ...The project serves as a centralized index for researchers and practitioners who want to explore algorithms, datasets, and publications associated with detecting unusual patterns in data. The repository organizes resources into structured categories such as books, tutorials, academic papers, datasets, benchmark frameworks, and open-source toolkits. It includes materials covering a wide range of anomaly detection domains, including time series data, graph data, tabular datasets, and real-time monitoring systems. By compiling resources from multiple programming ecosystems such as Python, R, and other machine learning platforms, the repository allows users to discover both research papers and practical implementations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    ...The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation across base and target domains to measure how well the model retains its general knowledge while specializing as needed. It includes utilities to fine-tune vision-language embeddings, compute prompt or adapter updates, and benchmark across transfer and retention metrics. MetaCLIP is especially suited for real-world settings where a model must continuously incorporate new visual categories or domains over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OpenCompass

    OpenCompass

    OpenCompass is an LLM evaluation platform

    ...With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models. OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    PyG

    PyG

    Graph Neural Network Library for PyTorch

    ...In addition, it consists of easy-to-use mini-batch loaders for operating on many small and single giant graphs, multi GPU-support, DataPipe support, distributed graph learning via Quiver, a large number of common benchmark datasets (based on simple interfaces to create your own), the GraphGym experiment manager, and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds. All it takes is 10-20 lines of code to get started with training a GNN model (see the next section for a quick tour).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    IQuest-Coder-V1 Model Family

    IQuest-Coder-V1 Model Family

    New family of code large language models (LLMs)

    IQuest-Coder-V1 is a cutting-edge family of open-source large language models specifically engineered for code generation, deep code understanding, and autonomous software engineering tasks. These models range from tens of billions to smaller footprints and are trained on a novel code-flow multi-stage paradigm that captures how real software evolves over time — not just static code snapshots — giving them a deeper semantic understanding of programming logic. They support native long contexts...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    rLLM

    rLLM

    Democratizing Reinforcement Learning for LLMs

    rLLM is an open-source framework for building and training post-training language agents via reinforcement learning — that is, using reinforcement signals to fine-tune or adapt language models (LLMs) into customizable agents for real-world tasks. With rLLM, developers can define custom “agents” and “environments,” and then train those agents via reinforcement learning workflows, possibly surpassing what vanilla fine-tuning or supervised learning might provide. The project is designed to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Poetiq

    Poetiq

    Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1

    poetiq-arc-agi-solver is the open-source codebase from Poetiq that replicates their record-breaking submission to the challenging benchmark suite ARC-AGI (both ARC-AGI-1 and ARC-AGI-2). The project demonstrates a system that orchestrates large language models (LLMs) — like those from major providers — with carefully engineered prompting, reasoning workflows, and dynamic strategies, to tackle the abstract, logic-heavy problems in ARC-AGI. Instead of relying on a single prompt or fixed strategy, their solver dynamically adapts the reasoning path, selecting what to ask or analyze next depending on intermediate results — effectively compositing reasoning, perception, and program synthesis (or symbolic manipulation) in a loop. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Agentex

    Agentex

    Open source codebase for Scale Agentex

    AgentEX is an open framework from Scale for building, running, and evaluating agentic workflows, with an emphasis on reproducibility and measurable outcomes rather than ad-hoc demos. It treats an “agent” as a composition of a policy (the LLM), tools, memory, and an execution runtime so you can test the whole loop, not just prompting. The repo focuses on structured experiments: standardized tasks, canonical tool interfaces, and logs that make it possible to compare models, prompts, and tool...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Tracking Any Point (TAP)

    Tracking Any Point (TAP)

    DeepMind model for tracking arbitrary points across videos & robotics

    TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Habitat-Lab

    Habitat-Lab

    A modular high-level library to train embodied AI agents

    ...Providing algorithms for single and multi-agent training (via imitation or reinforcement learning, or no learning at all as in SensePlanAct pipelines), as well as tools to benchmark their performance on the defined tasks using standard metrics.
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo