Showing 30 open source projects for "building"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 1
    Open Infra Index

    Open Infra Index

    Production-tested AI infrastructure tools

    open-infra-index is a central “infrastructure index” repository maintained by DeepSeek AI that acts as a catalog and hub for a collection of production-tested AI infrastructure tools and internal building blocks they have open-sourced. Instead of a single monolithic codebase, it functions more like an index or launching point: linking and documenting a set of library repos (e.g. FlashMLA, DeepEP, DeepGEMM, 3FS, etc.) that together form DeepSeek’s infrastructure stack. The repo's README describes the project as sharing “humble building blocks” of their online service—code that is documented, deployed, and battle-tested in production. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    xFormers

    xFormers

    Hackable and optimized Transformers building blocks

    xformers is a modular, performance-oriented library of transformer building blocks, designed to allow researchers and engineers to compose, experiment, and optimize transformer architectures more flexibly than monolithic frameworks. It abstracts components like attention layers, feedforward modules, normalization, and positional encoding, so you can mix and match or swap optimized kernels easily. One of its key goals is efficient attention: it supports dense, sparse, low-rank, and approximate attention mechanisms (e.g. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    GLM-5

    GLM-5

    From Vibe Coding to Agentic Engineering

    GLM-5 is a next-generation open-source large language model (LLM) developed by the Z .ai team under the zai-org organization that pushes the boundaries of reasoning, coding, and long-horizon agentic intelligence. Building on earlier GLM series models, GLM-5 dramatically scales the parameter count (to roughly 744 billion) and expands pre-training data to significantly improve performance on complex tasks such as multi-step reasoning, software engineering workflows, and agent orchestration compared to its predecessors like GLM-4.5. It incorporates innovations like DeepSeek Sparse Attention (DSA) to preserve massive context windows while reducing deployment costs and supporting long context processing, which is crucial for detailed plans and agent tasks.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 4
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while maintaining or improving feature quality. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    GLM-4.1V

    GLM-4.1V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    ...It represents a trade-off: somewhat reduced capacity compared to 4.5V or 4.6V, but with benefits in terms of speed, deployability, and lower hardware requirements — making it especially useful for developers experimenting locally, building lightweight agents, or deploying on limited infrastructure. Given its open-source availability under the same project repository, it provides an accessible entry point for testing multimodal reasoning and building proof-of-concept applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    LTX-Video

    LTX-Video

    Official repository for LTX-Video

    LTX-Video is a sophisticated multimedia processing framework from Lightricks designed to handle high-quality video editing, compositing, and transformation tasks with performance and scalability. It provides runtime components that efficiently decode, encode, and manipulate video streams, frame buffers, and audio tracks while exposing a rich API for building customized editing features like transitions, effects, color grading, and keyframe automation. The toolkit is built with both real-time and offline workflows in mind, enabling applications from consumer editing to professional content creation and batch processing. Internally optimized for multi-core processors and hardware acceleration where available, LTX-Video makes it feasible to work with high-resolution content and complex timelines without sacrificing responsiveness.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 7
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 8
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Step 3.5 Flash

    Step 3.5 Flash

    Fast, Sharp & Reliable Agentic Intelligence

    ...Unlike dense models that activate all their parameters for every token, Step 3.5 Flash uses a sparse Mixture-of-Experts (MoE) architecture that selectively engages only about 11 billion of its roughly 196 billion total parameters per token, delivering high-quality reasoning and interaction at far lower compute cost and latency than traditional large models. Its design targets deep reasoning, long-context handling, coding, and real-time responsiveness, making it suitable for building autonomous agents, advanced assistants, and long-chain cognitive workflows without sacrificing performance.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    OpenAI Harmony

    OpenAI Harmony

    Renderer for the harmony response format to be used with gpt-oss

    ...The format is essential for ensuring gpt-oss models operate correctly, as they are trained to rely on this structure for generating and organizing their responses. For users accessing gpt-oss through third-party providers like HuggingFace, Ollama, or vLLM, Harmony formatting is handled automatically, but developers building custom inference setups must implement it directly. With its flexible design, Harmony serves as the foundation for creating more interpretable, controlled, and extensible interactions with open-weight language models.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    MiniMind-O

    MiniMind-O

    A 0.1B Omni model trained from scratch

    MiniMind-O is an educational open-source project for building a small end-to-end Omni model from scratch. It extends the MiniMind family by exploring a model that can handle text, audio, and image inputs while producing text and streaming speech outputs. The project is designed to make multimodal AI training more accessible by keeping the model size small enough for ordinary personal hardware.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    MOSS-TTS Family

    MOSS-TTS Family

    MOSS‑TTS Family open‑source speech and sound generation model

    ...The broader family also includes dialogue generation, prompt-based voice creation, streaming voice-agent support, and a unified audio tokenizer. It is especially useful for developers building dubbing, podcasts, audiobooks, voice assistants, character voices, and creative audio tools.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Anthropic SDK Python

    Anthropic SDK Python

    Provides convenient access to the Anthropic REST API from any Python 3

    ...Importantly, it also supports streaming responses via Server-Sent Events (SSE) so that large outputs can be consumed incrementally rather than waiting for the full response. The client offers helper abstractions for tools (function-style “tools”) and streaming utilities for building interactive agents.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Lyra 2

    Lyra 2

    Project Lyra: Open Generative 3D World Models

    The Lyra 2 project is a research-driven framework developed by NVIDIA that focuses on building open generative 3D world models using advanced diffusion-based techniques. It enables the creation of fully explorable 3D environments from minimal inputs such as a single image or video, leveraging self-distillation methods to generate consistent spatial representations. The system evolves across versions, with newer iterations introducing long-horizon generation and improved 3D consistency across frames. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Map-Anything

    Map-Anything

    MapAnything: Universal Feed-Forward Metric 3D Reconstruction

    Map-Anything is a universal, feed-forward transformer for metric 3D reconstruction that predicts a scene’s geometry and camera parameters directly from visual inputs. Instead of stitching together many task-specific models, it uses a single architecture that supports a wide range of 3D tasks—multi-image structure-from-motion, multi-view stereo, monocular metric depth, registration, depth completion, and more. The model flexibly accepts different input combinations (images, intrinsics, poses,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    TADA

    TADA

    Open Source Speech Language Model

    TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Sapiens

    Sapiens

    High-resolution models for human tasks

    Sapiens is a research framework from Meta AI focused on embodied intelligence and human-like multimodal learning, aiming to train agents that can perceive, reason, and act in complex environments. It integrates sensory inputs such as vision, audio, and proprioception into a unified learning architecture that allows agents to understand and adapt to their surroundings dynamically. The project emphasizes long-horizon reasoning and cross-modal grounding—connecting language, perception, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Granite TSFM

    Granite TSFM

    Foundation Models for Time Series

    granite-tsfm collects public notebooks, utilities, and serving components for IBM’s Time Series Foundation Models (TSFM), giving practitioners a practical path from data prep to inference for forecasting and anomaly-detection use cases. The repository focuses on end-to-end workflows: loading data, building datasets, fine-tuning forecasters, running evaluations, and serving models. It documents the currently supported Python versions and points users to where the core TSFM models are hosted and how to wire up service components. Issues and examples in the tracker illustrate common tasks such as slicing inference windows or using pipeline helpers that return pandas DataFrames, grounding the library in day-to-day time-series operations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Seamless Communication

    Seamless Communication

    Foundational Models for State-of-the-Art Speech and Text Translation

    Seamless Communication is a research project focused on building more integrated, low-latency multimodal communication between humans and AI agents. The motivation is to move beyond “text in, text out” and enable direct, live, multi-turn exchange involving language, gesture, gaze, vision, and modality switching without user friction. The system architecture includes a real-time multimodal signal pipeline for audio, video, and sensor data, a dialog manager that can decide when to act (speak, gesture, point) or query, and a cross-modal reasoning layer that fuses perception with semantic context. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ChatGPT Retrieval Plugin

    ChatGPT Retrieval Plugin

    The ChatGPT Retrieval Plugin lets you easily find personal documents

    ...It also contains plugin manifest files (OpenAPI spec, plugin JSON) so that the retrieval backend can be registered in a plugin ecosystem. Because retrieval is often needed to make LLMs “know what’s in your docs” without leaking everything, this plugin aims to be a secure, flexible building block for retrieval-augmented generation (RAG) systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    GPT Discord Bot

    GPT Discord Bot

    Example Discord bot written in Python that uses the completions API

    ...The bot uses the Chat Completions API (defaulting to gpt-3.5-turbo) to carry out conversational interactions and the Moderations API to filter user messages. It is built on top of the discord.py framework and the OpenAI Python library, providing a simple, extensible template for building AI-powered Discord applications. The bot supports a /chat command that spawns a public thread, carries full conversation context across messages, and gracefully closes the thread when context or message limits are reached. Developers can customize system instructions through a config file and modify the model used for responses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Video Pre-Training

    Video Pre-Training

    Learning to Act by Watching Unlabeled Online Videos

    ...The idea is to learn general priors of control from large-scale, unlabeled video data, and then optionally fine-tune those priors for more goal-directed behavior via environment interaction. The repository contains demonstration models of different widths, fine-tuned variants (e.g. for building houses or early-game tasks), and inference scripts that instantiate agents from pretrained weights. Key modules include the behavioral cloning logic, the agent wrapper, and data loading pipelines (with an accessible skeleton for loading Minecraft demonstration data). The repo also includes a run_agent.py script for testing an agent interactively, and an agent.py module encapsulating the control logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    MUSE

    MUSE

    A library for Multilingual Unsupervised or Supervised word Embeddings

    ...The training and evaluation pipeline is lightweight and fast, so experimenting with different languages or initialization strategies is easy. Beyond dictionary induction, the learned embeddings are often used as building blocks for downstream tasks like classification, retrieval, or machine translation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Retrieval-Based Conversational Model

    Retrieval-Based Conversational Model

    Dual LSTM Encoder for Dialog Response Generation

    ...The core idea is to embed both the conversation context and potential replies into vector representations, then score how well each candidate fits the current dialogue, choosing the best match accordingly. Designed to work with datasets like the Ubuntu Dialogue Corpus, this codebase includes data preparation, model training, and evaluation components for building and assessing dialog models that can handle multi-turn conversations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Nemotron 3 Nano

    Nemotron 3 Nano

    LL model providing reasoning and conversational capabilities

    NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 is a mid-sized open large language model created by NVIDIA to provide strong reasoning and conversational capabilities while maintaining efficient deployment requirements. The model contains roughly 30 billion parameters and is designed to balance performance and computational efficiency, making it suitable for developers building AI applications that cannot run extremely large models. It is trained from scratch and built using a hybrid architecture that integrates Transformer attention layers with Mamba-style sequence modeling components inside a Mixture-of-Experts framework. This architecture allows the system to maintain strong reasoning capabilities while improving throughput and reducing the computational cost associated with large context processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next