29 projects for "spatial" with 2 filters applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Autonomous Agents

    Autonomous Agents

    Autonomous Agents (LLMs) research papers. Updated Daily

    ...The project explores how multiple agents can cooperate and interact with complex environments through machine learning, imitation learning, and multimodal sensing. It includes frameworks that integrate visual perception, tactile sensing, and spatial reasoning to guide the actions of robotic agents during manipulation or collaborative tasks. One of the central concepts explored in the repository is the integration of different sensory modalities using advanced machine learning techniques such as Feature-wise Linear Modulation and graph-based attention mechanisms. These methods allow agents to combine visual and geometric information while maintaining awareness of the spatial relationships between agents and objects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    ...The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Lyra 2

    Lyra 2

    Project Lyra: Open Generative 3D World Models

    ...It enables the creation of fully explorable 3D environments from minimal inputs such as a single image or video, leveraging self-distillation methods to generate consistent spatial representations. The system evolves across versions, with newer iterations introducing long-horizon generation and improved 3D consistency across frames. It combines elements of computer vision, generative modeling, and spatial intelligence to produce dynamic and navigable virtual worlds. The architecture is designed to handle both 3D and 4D scene generation, making it suitable for applications such as simulation, gaming, and virtual environments. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    VoxelMorph

    VoxelMorph

    Unsupervised Learning for Image Registration

    VoxelMorph is an open-source deep learning framework designed for medical image registration, a process that aligns multiple medical scans into a common spatial coordinate system. Traditional image registration techniques typically rely on optimization procedures that must be executed separately for each pair of images, which can be computationally expensive and slow. VoxelMorph approaches the problem using neural networks that learn to predict deformation fields that transform one image so that it aligns with another. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    SAM 3D Objects

    SAM 3D Objects

    Models for object and human mesh reconstruction

    SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image systems struggle. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    ...It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    HY-World 1.5

    HY-World 1.5

    A Systematic Framework for Interactive World Modeling

    ...It blends advanced reasoning with multimodal synthesis, enabling agents to describe scenes, generate context-appropriate responses, and contribute to narrative or gameplay flows. The underlying framework typically supports large-context state tracking across extended interactions, blending temporal and spatial multimodal signals.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    HeavyDB

    HeavyDB

    HeavyDB (formerly MapD/OmniSciDB)

    HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    SlowFast

    SlowFast

    Video understanding codebase from FAIR for reproducing video models

    SlowFast is a video understanding framework that captures both spatial semantics and temporal dynamics efficiently by processing video frames at two different temporal resolutions. The slow pathway encodes semantic context by sampling frames sparsely, while the fast pathway captures motion and fine temporal cues by operating on densely sampled frames with fewer channels. Together, these two pathways complement each other, allowing the network to model both appearance and motion without excessive computational cost. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    49 Agents IDE

    49 Agents IDE

    Open-source 2D IDE for managing AI agents in native CLIs

    49Agents is an open-source “agentic IDE” that reimagines how developers interact with multiple AI agents, terminals, and development environments by placing everything onto a single infinite, zoomable canvas. Instead of relying on traditional tab-based workflows, it provides a spatial interface where terminals, editors, Git views, and monitoring tools coexist as movable panes, enabling users to manage complex multi-agent systems visually and intuitively. The platform is designed to work across multiple machines simultaneously, allowing agents running on different devices or servers to connect to a unified workspace without requiring SSH, which simplifies distributed development workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    BuildingAI

    BuildingAI

    Build your own AI application system for free

    ...The platform aims to bridge the gap between natural language interfaces and building design tools by allowing AI systems to interpret user instructions and convert them into structured architectural operations. By combining generative AI capabilities with building data models, the system can assist with tasks such as design generation, spatial reasoning, and building component creation. The project is intended for architects, engineers, and developers exploring how AI can automate or augment design workflows in the architecture, engineering, and construction industries. It supports interactions where users describe building features, layouts, or modifications in natural language and the AI translates those instructions into actionable design operations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Seamless Communication

    Seamless Communication

    Foundational Models for State-of-the-Art Speech and Text Translation

    Seamless Communication is a research project focused on building more integrated, low-latency multimodal communication between humans and AI agents. The motivation is to move beyond “text in, text out” and enable direct, live, multi-turn exchange involving language, gesture, gaze, vision, and modality switching without user friction. The system architecture includes a real-time multimodal signal pipeline for audio, video, and sensor data, a dialog manager that can decide when to act (speak,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    LLaMA-Mesh

    LLaMA-Mesh

    Unifying 3D Mesh Generation with Language Models

    ...As a result, the model can generate mesh models directly from text prompts, explain mesh structures in natural language, or output mixed text-and-mesh sequences. This unified representation enables a single model to operate across both textual and spatial domains.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    hCaptcha Challenger

    hCaptcha Challenger

    Gracefully face hCaptcha challenge with multimodal llms

    ...Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. It implements an agent-style workflow where the system interprets the challenge prompt, selects the appropriate vision model, and generates the required interaction automatically.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Thinknowlogy

    Thinknowlogy

    The world's only naturally intelligent knowledge technology

    ...This naturally occurring logic provides concrete clues for organizing natural objects, like: - Grouping objects that belong together, - Separating objects that don't belong together, - Archiving objects that have become less important. Natural language and spatial information are sources of natural intelligence: - Natural language is providing concrete logic for organizing knowledge objects, - Spatial information provides concrete logic for organizing spatial objects (utilized in, e.g., self-driving cars). In this way, our brains know how to organize their knowledge and spatial information. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Exposure Correction

    Exposure Correction

    Learning multi-scale deep model correcting over- and under- exposed

    ...The repository focuses on correcting poorly exposed photographs, handling both underexposure and overexposure using a deep learning approach. The method employs a multi-scale framework that learns to enhance images by adjusting exposure levels across different spatial resolutions. This allows the model to preserve fine details while correcting global lighting inconsistencies. The repository includes pre-trained models, datasets, and training/testing code to enable reproducibility and experimentation. By leveraging this framework, researchers and developers can apply exposure correction to a wide range of natural images, improving visual quality without manual editing. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    ControlNet

    ControlNet

    Let us control diffusion models

    ControlNet is a neural network architecture designed to add conditional control to text-to-image diffusion models. Rather than training from scratch, ControlNet “locks” the weights of a pre-trained diffusion model and introduces a parallel trainable branch that learns additional conditions—like edges, depth maps, segmentation, human pose, scribbles, or other guidance signals. This allows the system to control where and how the model should focus during generation, enabling users to steer...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Recast Navigation

    Recast Navigation

    Navigation mesh generation and pathfinding toolkit for game AI systems

    ...Its core component, Recast, constructs navigation meshes by rasterizing triangle meshes into voxels, filtering out areas that are not walkable, and converting the remaining regions into polygon meshes suitable for navigation. Detour complements this system by providing runtime pathfinding, spatial queries, and navigation utilities that allow agents to move efficiently across the generated mesh. Recast Navigation also includes modules such as DetourCrowd for crowd simulation and agent collision avoidance, as well as DetourTileCache for streaming and dynamically updating navigation meshes in large or changing environments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    ...Its core idea is to cast segmentation as mask classification: a transformer decoder predicts a set of mask queries, each with an associated class score, eliminating the need for task-specific heads. A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial support. This leads to accurate masks with sharp boundaries and strong small-object performance while remaining efficient on high-resolution inputs. The project provides extensive configurations and pretrained models across popular benchmarks like COCO, ADE20K, and Cityscapes. Built on top of Detectron2, it includes training scripts, inference tools, and visualization utilities that make experimentation straightforward.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    TimeSformer

    TimeSformer

    The official pytorch implementation of our paper

    ...TimeSformer was influential in showing that pure transformer architectures—without convolutional backbones—can perform strongly on video classification tasks. Its flexible attention design allows experimenting with different factoring (spatial-then-temporal, joint, etc.) to trade off compute, memory, and accuracy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Dynamic Routing Between Capsules

    Dynamic Routing Between Capsules

    A PyTorch implementation of the NIPS 2017 paper

    Dynamic Routing Between Capsules is a PyTorch implementation of the Capsule Network architecture originally proposed to address limitations in traditional convolutional neural networks. Capsule networks aim to improve how neural models represent spatial hierarchies and relationships between objects within images. Instead of scalar neuron activations, capsules output vectors that encode both the presence of features and their spatial properties such as orientation or pose. The repository implements the dynamic routing algorithm between capsules, which allows lower-level features to route their outputs to higher-level structures that best represent the detected patterns. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    ...The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. In practice, sg2im demonstrates how structured semantics can guide generative models to produce controllable, compositional imagery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    FastPhotoStyle

    FastPhotoStyle

    Style transfer, deep learning, feature transform

    FastPhotoStyle is a deep learning-based image stylization framework designed to transfer the style of one photograph onto another while preserving photorealistic quality. Unlike traditional artistic style transfer methods that produce painterly outputs, this approach focuses on maintaining realistic textures, lighting, and spatial consistency. The method is based on a two-step process that includes a stylization phase followed by a smoothing operation, ensuring that the output image remains coherent and free of visual artifacts. It is computationally efficient due to its closed-form solution, allowing fast processing compared to iterative optimization-based methods. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    SpatialML is a markup language for representing spatial expressions in natural language documents. The goal is to allow for better integration of text collections with resources such as databases that provide spatial information about a domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo