Showing 17 open source projects for "240p-test-suite"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong instruction-following capabilities and richer motion outputs compared to existing open-source models. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    HY-MT

    HY-MT

    Hunyuan Translation Model Version 1.5

    HY-MT (Hunyuan Translation) is a high-quality multilingual machine translation model suite developed to support mutual translation across dozens of languages with strong performance even at smaller model scales. It ships with both an 1.8 B parameter model and a larger 7 B model, the latter optimized not only for direct translation but also for formatted and contextualized output, allowing better handling of terminology and mixed-language content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MiniMax-M1

    MiniMax-M1

    Open-weight, large-scale hybrid-attention reasoning model

    MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to support a native context length of 1 million tokens while using far fewer FLOPs than comparable reasoning models for very long generations. The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Poetiq

    Poetiq

    Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1

    poetiq-arc-agi-solver is the open-source codebase from Poetiq that replicates their record-breaking submission to the challenging benchmark suite ARC-AGI (both ARC-AGI-1 and ARC-AGI-2). The project demonstrates a system that orchestrates large language models (LLMs) — like those from major providers — with carefully engineered prompting, reasoning workflows, and dynamic strategies, to tackle the abstract, logic-heavy problems in ARC-AGI. Instead of relying on a single prompt or fixed strategy, their solver dynamically adapts the reasoning path, selecting what to ask or analyze next depending on intermediate results — effectively compositing reasoning, perception, and program synthesis (or symbolic manipulation) in a loop. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Step1X-Edit

    Step1X-Edit

    A SOTA open-source image editing model

    ...The model targets general-purpose editing: from object addition/removal, style changes, recoloring, retouching, background replacement, to complex transformations like changing lighting, mood, or art style. The authors trained it on a large curated dataset and benchmarked it on a newly introduced evaluation suite, showing that Step1X-Edit significantly outperforms previous open-source baselines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FireRedASR

    FireRedASR

    Open-source industrial-grade ASR models

    FireRedASR is an industrial-grade family of open-source automatic speech recognition models designed to provide high-precision speech-to-text performance across languages including Mandarin, English, and various Chinese dialects, achieving new state-of-the-art benchmarks on public test sets. The project includes multiple model variants to meet different application needs, such as high-accuracy end-to-end interaction using an encoder-adapter-LLM framework and efficient real-time recognition using attention-based encoder-decoder architectures, giving developers flexibility in balancing performance and resource constraints. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Transformer Debugger

    Transformer Debugger

    Tool for exploring and debugging transformer model behaviors

    Transformer Debugger (TDB) is a research tool developed by OpenAI’s Superalignment team to investigate and interpret the behaviors of small language models. It combines automated interpretability methods with sparse autoencoders, enabling researchers to analyze how specific neurons, attention heads, and latent features contribute to a model’s outputs. TDB allows users to intervene directly in the forward pass of a model and observe how such interventions change predictions, making it...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    HunyuanWorld-Mirror

    HunyuanWorld-Mirror

    Fast and Universal 3D reconstruction model for versatile tasks

    ...The project sits within a broader family of Hunyuan models that explore world generation and 3D-consistent understanding, and this mirror variant makes the reconstruction stack easier to test. It’s attractive for rapid prototyping of scenes, environment scans, or reference assets when you need repeatable 3D results from ordinary media.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Warlock-Studio

    Warlock-Studio

    AI Suite for upscaling, interpolating & restoring images/videos

    v6.0. Warlock-Studio is a Windows application that uses Real-ESRGAN, BSRGAN, IRCNN, GFPGAN, RealESRNet, RealESRAnime and RIFE Artificial Intelligence models to upscale, restore faces, interpolate frames and reduce noise in images and videos. the application supports GPU acceleration (including multi-GPU setups) and offers batch processing for large workloads. It includes drag-and-drop handling for single or multiple files, optional pre-resize functions, and an automatic tiling system...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 14
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    PRM800K

    PRM800K

    800,000 step-level correctness labels on LLM solutions to MATH problem

    PRM800K is a process supervision dataset accompanying the paper Let’s Verify Step by Step, providing 800,000 step-level correctness labels on model-generated solutions to problems from the MATH dataset. The repository releases the raw labels and the labeler instructions used in two project phases, enabling researchers to study how human raters graded intermediate reasoning. Data are stored as newline-delimited JSONL files tracked with Git LFS, where each line is a full solution sample that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Video Pre-Training

    Video Pre-Training

    Learning to Act by Watching Unlabeled Online Videos

    The Video PreTraining (VPT) repository provides code and model artifacts for a project where agents learn to act by watching human gameplay videos—specifically, gameplay of Minecraft—using behavioral cloning. The idea is to learn general priors of control from large-scale, unlabeled video data, and then optionally fine-tune those priors for more goal-directed behavior via environment interaction. The repository contains demonstration models of different widths, fine-tuned variants (e.g. for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    FixRes

    FixRes

    Reproduces results of "Fixing the train-test resolution discrepancy"

    ...FixRes demonstrates that a mismatch between training and testing resolutions often leads to suboptimal accuracy, and fine-tuning the classifier and batch normalization layers at higher test resolutions significantly enhances performance. The repository includes pretrained models, feature embeddings, and evaluation scripts corresponding to the experiments reported in the NeurIPS 2019 paper “Fixing the train-test resolution discrepancy.”
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB