Search Results for "Open Source Discovery Open Source & DevTools" - Page 7

Showing 226 open source projects for "Open Source Discovery Open Source & DevTools"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Tiktoken

    Tiktoken

    tiktoken is a fast BPE tokeniser for use with OpenAI's models

    tiktoken is a high-performance, tokenizer library (based on byte-pair encoding, BPE) designed for use with OpenAI’s models. It handles encoding and decoding text to token IDs efficiently, with minimal overhead. Because tokenization is a fundamental step in preparing text for models, tiktoken is optimized for speed, memory, and correctness in model contexts (e.g. matching OpenAI’s internal tokenization). The repo supports multiple encodings (e.g. “cl100k_base”) and lets users switch encoding...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Hunyuan3D-1

    Hunyuan3D-1

    A Unified Framework for Text-to-3D and Image-to-3D Generation

    ...It provides a framework combining shape generation and texture synthesis, enabling users to create 3D assets from images or text conditions. While less advanced than version 2.1, it laid the foundations for the later PBR, higher resolution, and open-source enhancements. (Note: less detailed public documentation was found for Hunyuan3D-1 compared to 2.1.). Community and ecosystem support (e.g. usage via Blender addon for geometry/texture). Integration into user-friendly tools/platforms.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Tracking Any Point (TAP)

    Tracking Any Point (TAP)

    DeepMind model for tracking arbitrary points across videos & robotics

    TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Mesh R-CNN

    Mesh R-CNN

    code for Mesh R-CNN, ICCV 2019

    Mesh R-CNN is a 3D reconstruction and object understanding framework developed by Facebook Research that extends Mask R-CNN into the 3D domain. Built on top of Detectron2 and PyTorch3D, Mesh R-CNN enables end-to-end 3D mesh prediction directly from single RGB images. The model learns to detect, segment, and reconstruct detailed 3D mesh representations of objects in natural images, bridging the gap between 2D perception and 3D understanding. Unlike voxel-based or point-based approaches, Mesh...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Map-Anything

    Map-Anything

    MapAnything: Universal Feed-Forward Metric 3D Reconstruction

    Map-Anything is a universal, feed-forward transformer for metric 3D reconstruction that predicts a scene’s geometry and camera parameters directly from visual inputs. Instead of stitching together many task-specific models, it uses a single architecture that supports a wide range of 3D tasks—multi-image structure-from-motion, multi-view stereo, monocular metric depth, registration, depth completion, and more. The model flexibly accepts different input combinations (images, intrinsics, poses,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Large Concept Model

    Large Concept Model

    Language modeling in a sentence representation space

    Large Concept Model is a research codebase centered on concept-centric representation learning at scale, aiming to capture shared structure across many categories and modalities. It organizes training around concepts (rather than just raw labels), encouraging models to understand attributes, relations, and compositional structure that transfer across tasks. The repository provides training loops, data tooling, and evaluation routines to learn and probe these concept embeddings, typically...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MedicalGPT

    MedicalGPT

    MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training

    MedicalGPT training medical GPT model with ChatGPT training pipeline, implementation of Pretraining, Supervised Finetuning, Reward Modeling and Reinforcement Learning. MedicalGPT trains large medical models, including secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Poetiq

    Poetiq

    Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1

    poetiq-arc-agi-solver is the open-source codebase from Poetiq that replicates their record-breaking submission to the challenging benchmark suite ARC-AGI (both ARC-AGI-1 and ARC-AGI-2). The project demonstrates a system that orchestrates large language models (LLMs) — like those from major providers — with carefully engineered prompting, reasoning workflows, and dynamic strategies, to tackle the abstract, logic-heavy problems in ARC-AGI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one system, enabling developers to build rich, multimodal audio applications without stitching together disparate components. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 16
    Core ML Stable Diffusion

    Core ML Stable Diffusion

    Stable Diffusion with Core ML on Apple Silicon

    Run Stable Diffusion on Apple Silicon with Core ML. python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python. StableDiffusion, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by python_coreml_stable_diffusion. Hugging Face ran the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ChatGPT Retrieval Plugin

    ChatGPT Retrieval Plugin

    The ChatGPT Retrieval Plugin lets you easily find personal documents

    The chatgpt-retrieval-plugin repository implements a semantic retrieval backend that lets ChatGPT (or GPT-powered tools) access private or organizational documents in natural language by combining vector search, embedding models, and plugin infrastructure. It can serve as a custom GPT plugin or function-calling backend so that a chat session can “look up” relevant documents based on user queries, inject those results into context, and respond more knowledgeably about a private knowledge...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DeepSeek Math

    DeepSeek Math

    Pushing the Limits of Mathematical Reasoning in Open Language Models

    DeepSeek-Math is DeepSeek’s specialized model (or dataset + evaluation) focusing on mathematical reasoning, symbolic manipulation, proof steps, and advanced quantitative problem solving. The repository is likely to include fine-tuning routines or task datasets (e.g. MATH, GSM8K, ARB), demonstration notebooks, prompt templates, and evaluation results on math benchmarks. The goal is to push DeepSeek’s performance in domains that require rigorous symbolic steps, calculus, linear algebra, number...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Stable Diffusion

    Stable Diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

    Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models....
    Downloads: 211 This Week
    Last Update:
    See Project
  • 22
    Improved Diffusion

    Improved Diffusion

    Release for Improved Denoising Diffusion Probabilistic Models

    improved-diffusion is an open source implementation of diffusion probabilistic models created by OpenAI. These models, also known as score-based generative models, are a class of generative models that have shown strong performance in producing high-quality synthetic data such as images. The repository provides code for training and sampling diffusion models with improved techniques that enhance stability, efficiency, and output fidelity.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Grok-1

    Grok-1

    Open-source, high-performance Mixture-of-Experts large language model

    Grok-1 is a 314-billion-parameter Mixture-of-Experts (MoE) large language model developed by xAI. Designed to optimize computational efficiency, it activates only 25% of its weights for each input token. In March 2024, xAI released Grok-1's model weights and architecture under the Apache 2.0 license, making them openly accessible to developers. The accompanying GitHub repository provides JAX example code for loading and running the model. Due to its substantial size, utilizing Grok-1...
    Leader badge
    Downloads: 18 This Week
    Last Update:
    See Project
  • 24
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    Warlock-Studio

    Warlock-Studio

    AI Suite for upscaling, interpolating & restoring images/videos

    v6.0. Warlock-Studio is a Windows application that uses Real-ESRGAN, BSRGAN, IRCNN, GFPGAN, RealESRNet, RealESRAnime and RIFE Artificial Intelligence models to upscale, restore faces, interpolate frames and reduce noise in images and videos. the application supports GPU acceleration (including multi-GPU setups) and offers batch processing for large workloads. It includes drag-and-drop handling for single or multiple files, optional pre-resize functions, and an automatic tiling system...
    Downloads: 24 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB