Showing 91 open source projects for "space"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    CLIP

    CLIP

    CLIP, Predict the most relevant text snippet given an image

    CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given image—even without explicit training for that classification task. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    ImageBind

    ImageBind

    ImageBind One Embedding Space to Bind Them All

    ImageBind is a multimodal embedding framework that learns a shared representation space across six modalities—images, text, audio, depth, thermal, and IMU (inertial motion) data—without requiring explicit pairwise training for every modality combination. Instead of aligning each pair independently, ImageBind uses image data as the central binding modality, aligning all other modalities to it so they can interoperate zero-shot.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    vJEPA-2

    vJEPA-2

    PyTorch code and models for VJEPA2 self-supervised learning from video

    VJEPA2 is a next-generation self-supervised learning framework for video that extends the “predict in representation space” idea from i-JEPA to the temporal domain. Instead of reconstructing pixels, it predicts the missing high-level embeddings of masked space-time regions using a context encoder and a slowly updated target encoder. This objective encourages the model to learn semantics, motion, and long-range structure without the shortcuts that pixel-level losses can invite. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers.
    Downloads: 24 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    ...Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible motion and visuals. The model handles bilingual input (e.g. English and Chinese) thanks to dual encoders, and supports end-to-end text-to-video generation without requiring external assets. Its training and generation pipeline includes techniques like flow-matching, full 3D attention for temporal consistency, and fine-tuning approaches (e.g. video-based DPO) to improve fidelity and reduce artifacts. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Coconut

    Coconut

    Training Large Language Model to Reason in a Continuous Latent Space

    Coconut is the official PyTorch implementation of the research paper “Training Large Language Models to Reason in a Continuous Latent Space.” The framework introduces a novel method for enhancing large language models (LLMs) with continuous latent reasoning steps, enabling them to generate and refine reasoning chains within a learned latent space rather than relying solely on discrete symbolic reasoning. It supports training across multiple reasoning paradigms—including standard Chain-of-Thought (CoT), no-thought, and hybrid configurations—using configurable training stages and latent representations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    stable-diffusion-videos

    stable-diffusion-videos

    Create videos with Stable Diffusion

    Create videos with Stable Diffusion by exploring the latent space and morphing between text prompts. Try it yourself in Colab.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    UForm

    UForm

    Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

    UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space! It comes with a set of homonymous pre-trained networks available on HuggingFace portal and extends the transfromers package to support Mid-fusion Models. Late-fusion models encode each modality independently, but into one shared vector space. Due to independent encoding late-fusion models are good at capturing coarse-grained features but often neglect fine-grained ones. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Video-subtitle-remover (VSR)

    Video-subtitle-remover (VSR)

    AI tool that removes hardcoded subtitles and text from videos locally

    Video Subtitle Remover is an AI-based application designed to remove hardcoded subtitles from videos and generate new files without the embedded text. Video Subtitle Remover analyzes video frames and detects subtitle regions, then replaces the removed areas using an AI algorithm that fills the space with reconstructed visual content. This process aims to maintain the original resolution and visual continuity of the video after subtitle removal. It allows users to define a specific subtitle region so that only text in that area is removed rather than modifying the entire frame. It can also automatically remove text throughout the whole video when a position is not specified. ...
    Downloads: 60 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 10
    SkillOpt

    SkillOpt

    Text-space optimizer that trains reusable natural-language skills

    ...The system learns from agent rollouts, reflection, bounded edits, and validation gates to produce better instructions over time. Its output is a deployable best_skill.md artifact that can be reused across agent tasks. The project is focused on making agents more effective through text-space optimization rather than traditional fine-tuning. It is most useful for AI researchers and agent developers studying self-improving workflows, skill libraries, and evaluation-driven prompt refinement.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    JEPA

    JEPA

    PyTorch code and models for V-JEPA self-supervised learning from video

    ...The repository provides training recipes, data pipelines, and evaluation utilities for image JEPA variants and often includes ablations that illuminate which masking and architectural choices matter. Because the objective is non-autoregressive and operates in embedding space, JEPA tends to be compute-efficient and stable at scale. The approach has become a strong alternative to contrastive or pixel-reconstruction methods for representation learning.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    ...For text extraction from audio, it provides HeartTranscriptor, a Whisper-based model tuned specifically for lyrics transcription, which helps bridge generated or recorded audio back into structured text. It also introduces HeartCLAP, which aligns audio and text into a shared embedding space.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography,...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    UI-TARS

    UI-TARS

    UI-TARS-desktop version that can operate on your local personal device

    UI-TARS is an open-source multimodal “GUI agent” created by ByteDance: a model designed to perceive raw screenshots (or rendered UI frames), reason about what needs to be done, and then perform real interactions with graphical user interfaces (GUIs) — like clicking, typing, navigating menus — across desktop, browser, mobile, or game environments. Rather than relying on rigid, manually scripted UI automation, UI-TARS uses a unified vision-language model (VLM) that integrates perception,...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 15
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. It integrates a latent-space audio encoder, discrete acoustic tokens, and reinforcement-learning–based training (CoT + RL) to enhance its ability to capture and reproduce voice styles, intonations, and subtle vocal cues. Moreover, Step-Audio2 supports tool-calling and retrieval-augmented generation (RAG), allowing it to access external knowledge sources or audio/text databases, thus reducing hallucinations and improving coherence in complex dialogues.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    KerasTuner

    KerasTuner

    A Hyperparameter Tuning Library for Keras

    KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily configure your search space with a define-by-run syntax, then leverage one of the available search algorithms to find the best hyperparameter values for your models. KerasTuner comes with Bayesian Optimization, Hyperband, and Random Search algorithms built-in, and is also designed to be easy for researchers to extend in order to experiment with new search algorithms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Physical Symbolic Optimization (Φ-SO)

    Physical Symbolic Optimization (Φ-SO)

    Physical Symbolic Optimization

    Physical Symbolic Optimization (Φ-SO) - A symbolic optimization package built for physics. Symbolic regression module uses deep reinforcement learning to infer analytical physical laws that fit data points, searching in the space of functional forms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LangKit

    LangKit

    An open-source toolkit for monitoring Language Learning Models (LLMs)

    ...Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability space - a challenge worth solving, since the lack of visibility on the model's behavior can have serious consequences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    HRM-Text

    HRM-Text

    1B text generation model based on the HRM architecture

    ...It is designed to make foundation model pretraining more accessible by reducing compute and data requirements compared with traditional scaling-heavy approaches. The system combines hierarchical recurrent design, task-completion strengthening, and latent-space reasoning. Its training stack includes PrefixLM sequence packing, FlashAttention 3 kernels, PyTorch FSDP2, evaluation scripts, and checkpoint conversion tools. The repository supports reference pretraining runs for smaller and larger configurations, with Hopper-class GPUs expected for the attention path. It is useful for researchers and engineers exploring efficient language model pretraining, reasoning-focused architectures, and reproducible foundation model experiments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Story Flicks

    Story Flicks

    Generate high-definition story short videos with one click using AI

    Story Flicks is another open-source project in the AI-assisted video generation / editing space, focused on creating short, story-style videos from script or prompt inputs. It aims to let users generate high-definition short movies or video stories with minimal manual effort, using AI models under the hood to assemble visuals, timing, and possibly narration or subtitles. For creators who want to produce narrative short-form content — whether for social media, storytelling, or prototyping video ideas — story-flicks offers a lightweight, code-backed alternative to complex video editing suites. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    InvokeAI is an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator. It provides a streamlined process with various new features and options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on GPU cards with as little as 4 GB or RAM. InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies....
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    AppAgent

    AppAgent

    Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

    AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    LeWorldModel

    LeWorldModel

    Official code base for LeWorldModel: Stable End-to-End Joint-Embedding

    LeWorldModel is a minimalist tiling window manager designed for the X11 windowing system, focusing on simplicity, performance, and efficient use of screen space. It provides automatic window tiling behavior, organizing application windows into structured layouts without requiring manual resizing or positioning. The project emphasizes a lightweight design, minimizing resource usage while maintaining responsiveness and stability. It is highly configurable through source code or configuration files, allowing users to tailor behavior, keybindings, and layouts to their preferences. le-wm is intended for users who prefer keyboard-driven workflows and a distraction-free desktop environment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
Auth0 Logo