Showing 72 open source projects for "tools"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    DeepSeek-V3.2-Exp

    DeepSeek-V3.2-Exp

    An experimental version of DeepSeek model

    ...In public evaluations across a variety of reasoning, code, and question-answering benchmarks (e.g. MMLU, LiveCodeBench, AIME, Codeforces, etc.), V3.2-Exp shows performance very close to or in some cases matching that of V3.1-Terminus. The repository includes tools and kernels to support the new sparse architecture—for instance, CUDA kernels, logit indexers, and open-source modules like FlashMLA and DeepGEMM are invoked for performance.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    xFormers

    xFormers

    Hackable and optimized Transformers building blocks

    ...It also integrates with PyTorch seamlessly so you can drop in its blocks to existing models, replace default attention layers, or build new architectures from scratch. xformers includes training, deployment, and memory profiling tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    Stable Diffusion WebUI Forge

    Stable Diffusion WebUI Forge

    Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion

    ...The UI surfaces advanced options in a way that remains recognizable to WebUI users, so migration costs are low while gaining experimental features. In practice, Forge serves as a proving ground for ideas that may later influence upstream tools, giving power users early access to cutting-edge techniques.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    BioEmu

    BioEmu

    Inference code for scalable emulation of protein equilibrium ensembles

    ...The difference can be very large if your protein has large disordered regions, which are very likely to produce clashes. BioEmu outputs structures in backbone frame representation. To reconstruct the side-chains, several tools are available. As an example, we interface with HPacker to conduct side-chain reconstruction and also provide basic tooling for running a short molecular dynamics (MD) equilibration.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    MOSS-TTS Family

    MOSS-TTS Family

    MOSS‑TTS Family open‑source speech and sound generation model

    ...The broader family also includes dialogue generation, prompt-based voice creation, streaming voice-agent support, and a unified audio tokenizer. It is especially useful for developers building dubbing, podcasts, audiobooks, voice assistants, character voices, and creative audio tools.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    ...Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and can output or act via tools seamlessly, bridging perception and execution. Its architecture supports a very large context window (on the order of 128K tokens during training), which lets it handle complex multimodal inputs like long documents, multi-page reports, or video transcripts, while maintaining coherence across extended content. In benchmarks and internal evaluations, GLM-4.6V achieves state-of-the-art (SoTA) performance among models of comparable parameter scale on multimodal reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    Perception Models is a state-of-the-art framework developed by Facebook Research for advanced image and video perception tasks. It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. Meanwhile,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    TADA

    TADA

    Open Source Speech Language Model

    TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SlowFast

    SlowFast

    Video understanding codebase from FAIR for reproducing video models

    SlowFast is a video understanding framework that captures both spatial semantics and temporal dynamics efficiently by processing video frames at two different temporal resolutions. The slow pathway encodes semantic context by sampling frames sparsely, while the fast pathway captures motion and fine temporal cues by operating on densely sampled frames with fewer channels. Together, these two pathways complement each other, allowing the network to model both appearance and motion without...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Transformer Debugger

    Transformer Debugger

    Tool for exploring and debugging transformer model behaviors

    Transformer Debugger (TDB) is a research tool developed by OpenAI’s Superalignment team to investigate and interpret the behaviors of small language models. It combines automated interpretability methods with sparse autoencoders, enabling researchers to analyze how specific neurons, attention heads, and latent features contribute to a model’s outputs. TDB allows users to intervene directly in the forward pass of a model and observe how such interventions change predictions, making it...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    VGGSfM

    VGGSfM

    VGGSfM: Visual Geometry Grounded Deep Structure From Motion

    ...Version 2.0 adds support for dynamic scene handling, dense point cloud export, video-based reconstruction (1000+ frames), and integration with Gaussian Splatting pipelines. It leverages tools like PyCOLMAP, poselib, LightGlue, and PyTorch3D for feature matching, pose estimation, and visualization. With minimal configuration, users can process single scenes or full video sequences, apply motion masks to exclude moving objects, and train neural radiance or splatting models directly from reconstructed outputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Large Concept Model

    Large Concept Model

    Language modeling in a sentence representation space

    ...The repository provides training loops, data tooling, and evaluation routines to learn and probe these concept embeddings, typically from large image–text or weakly supervised corpora. It includes utilities to build concept vocabularies, map supervision signals to those vocabularies, and measure zero-shot or few-shot generalization. Probing tools help diagnose what the model knows—e.g., attribute recognition, relation understanding, or compositionality—so you can iterate on data and objectives. The design is modular, making it straightforward to swap backbones, change objectives, or integrate retrieval components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Step1X-Edit

    Step1X-Edit

    A SOTA open-source image editing model

    Step1X-Edit is a state-of-the-art open-source image editing model/framework that uses a multimodal large language model (LLM) together with a diffusion-based image decoder to let users edit images simply via natural-language instructions plus a reference image. You supply an existing image and a textual command — e.g. “add a ruby pendant on the girl’s neck” or “make the background a sunset over mountains” — and the model interprets the instruction, computes a latent embedding combining the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Tongyi DeepResearch

    Tongyi DeepResearch

    Tongyi Deep Research, the Leading Open-source Deep Research Agent

    ...It uses a mix of synthetic data generation, fine-tuning and reinforcement learning; supports benchmarks like web search, document understanding, question answering, “agentic” tasks; provides inference tools, evaluation scripts, and “web agent” style interfaces. The aim is to enable more autonomous, agentic models that can perform sustained knowledge gathering, reasoning, and synthesis across multiple modalities (web, files, etc.).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    GLM-4.1V

    GLM-4.1V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.1V — often referred to as a smaller / lighter version of the GLM-V family — offers a more resource-efficient option for users who want multimodal capabilities without requiring large compute resources. Though smaller in scale, GLM-4.1V maintains competitive performance, particularly impressive on many benchmarks for models of its size: in fact, on a number of multimodal reasoning and vision-language tasks it outperforms some much larger models from other families. It represents a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    ...It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ChatGPT Clone

    ChatGPT Clone

    ChatGPT interface with better UI

    ...It showcases a clean separation between the web client and the message orchestration layer so you can experiment with prompts, roles, and memory strategies. The project is useful for prototyping assistants, documentation bots, and internal developer tools without committing to a specific vendor or UI framework. Configuration is kept simple so newcomers can get a working chat in minutes and then dial in features like authentication or multi-model routing. While it illustrates how to hook into third-party LLM endpoints, it is typically positioned as an educational, self-hosted starter that you should operate responsibly and within provider's terms of use.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25
    Stable Diffusion

    Stable Diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

    ...It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models. Stability AI's approach to image synthesis has contributed to creating detailed, scalable images while maintaining efficiency.
    Downloads: 217 This Week
    Last Update:
    See Project
Auth0 Logo