spatial free download

29 projects for "spatial" with 2 filters applied:

Artificial Intelligence ChromeOS Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
1

Autonomous Agents

Autonomous Agents (LLMs) research papers. Updated Daily

...The project explores how multiple agents can cooperate and interact with complex environments through machine learning, imitation learning, and multimodal sensing. It includes frameworks that integrate visual perception, tactile sensing, and spatial reasoning to guide the actions of robotic agents during manipulation or collaborative tasks. One of the central concepts explored in the repository is the integration of different sensory modalities using advanced machine learning techniques such as Feature-wise Linear Modulation and graph-based attention mechanisms. These methods allow agents to combine visual and geometric information while maintaining awareness of the spatial relationships between agents and objects.

Downloads: 0 This Week

Last Update: 2026-06-24
See Project
2

ML Ferret

Refer and Ground Anything Anywhere at Any Granularity

...The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
3

Lyra 2

Project Lyra: Open Generative 3D World Models

...It enables the creation of fully explorable 3D environments from minimal inputs such as a single image or video, leveraging self-distillation methods to generate consistent spatial representations. The system evolves across versions, with newer iterations introducing long-horizon generation and improved 3D consistency across frames. It combines elements of computer vision, generative modeling, and spatial intelligence to produce dynamic and navigable virtual worlds. The architecture is designed to handle both 3D and 4D scene generation, making it suitable for applications such as simulation, gaming, and virtual environments. ...

Downloads: 1 This Week

Last Update: 2026-06-11
See Project
4

VoxelMorph

Unsupervised Learning for Image Registration

VoxelMorph is an open-source deep learning framework designed for medical image registration, a process that aligns multiple medical scans into a common spatial coordinate system. Traditional image registration techniques typically rely on optimization procedures that must be executed separately for each pair of images, which can be computationally expensive and slow. VoxelMorph approaches the problem using neural networks that learn to predict deformation fields that transform one image so that it aligns with another. ...

Downloads: 10 This Week

Last Update: 2026-03-15
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

SAM 3D Objects

Models for object and human mesh reconstruction

SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image systems struggle. ...

Downloads: 9 This Week

Last Update: 2026-06-02
See Project
6

DeepSeek-OCR 2

Visual Causal Flow

...It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.

Downloads: 1 This Week

Last Update: 2026-02-03
See Project
7

HY-World 1.5

A Systematic Framework for Interactive World Modeling

...It blends advanced reasoning with multimodal synthesis, enabling agents to describe scenes, generate context-appropriate responses, and contribute to narrative or gameplay flows. The underlying framework typically supports large-context state tracking across extended interactions, blending temporal and spatial multimodal signals.

Downloads: 2 This Week

Last Update: 2026-06-10
See Project
8

HeavyDB

HeavyDB (formerly MapD/OmniSciDB)

HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
9

SlowFast

Video understanding codebase from FAIR for reproducing video models

SlowFast is a video understanding framework that captures both spatial semantics and temporal dynamics efficiently by processing video frames at two different temporal resolutions. The slow pathway encodes semantic context by sampling frames sparsely, while the fast pathway captures motion and fine temporal cues by operating on densely sampled frames with fewer channels. Together, these two pathways complement each other, allowing the network to model both appearance and motion without excessive computational cost. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
10

49 Agents IDE

Open-source 2D IDE for managing AI agents in native CLIs

49Agents is an open-source “agentic IDE” that reimagines how developers interact with multiple AI agents, terminals, and development environments by placing everything onto a single infinite, zoomable canvas. Instead of relying on traditional tab-based workflows, it provides a spatial interface where terminals, editors, Git views, and monitoring tools coexist as movable panes, enabling users to manage complex multi-agent systems visually and intuitively. The platform is designed to work across multiple machines simultaneously, allowing agents running on different devices or servers to connect to a unified workspace without requiring SSH, which simplifies distributed development workflows. ...

Downloads: 0 This Week

Last Update: 2026-05-07
See Project
11

BuildingAI

Build your own AI application system for free

...The platform aims to bridge the gap between natural language interfaces and building design tools by allowing AI systems to interpret user instructions and convert them into structured architectural operations. By combining generative AI capabilities with building data models, the system can assist with tasks such as design generation, spatial reasoning, and building component creation. The project is intended for architects, engineers, and developers exploring how AI can automate or augment design workflows in the architecture, engineering, and construction industries. It supports interactions where users describe building features, layouts, or modifications in natural language and the AI translates those instructions into actionable design operations.

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
12

Seamless Communication

Foundational Models for State-of-the-Art Speech and Text Translation

Seamless Communication is a research project focused on building more integrated, low-latency multimodal communication between humans and AI agents. The motivation is to move beyond “text in, text out” and enable direct, live, multi-turn exchange involving language, gesture, gaze, vision, and modality switching without user friction. The system architecture includes a real-time multimodal signal pipeline for audio, video, and sensor data, a dialog manager that can decide when to act (speak,...

Downloads: 0 This Week

Last Update: 2025-10-06
See Project
13

LLaMA-Mesh

Unifying 3D Mesh Generation with Language Models

...As a result, the model can generate mesh models directly from text prompts, explain mesh structures in natural language, or output mixed text-and-mesh sequences. This unified representation enables a single model to operate across both textual and spatial domains.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
14

hCaptcha Challenger

Gracefully face hCaptcha challenge with multimodal llms

...Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. It implements an agent-style workflow where the system interprets the challenge prompt, selects the appropriate vision model, and generates the required interaction automatically.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
15

Qwen-Image-Layered

Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
16

Thinknowlogy

The world's only naturally intelligent knowledge technology

...This naturally occurring logic provides concrete clues for organizing natural objects, like: - Grouping objects that belong together, - Separating objects that don't belong together, - Archiving objects that have become less important. Natural language and spatial information are sources of natural intelligence: - Natural language is providing concrete logic for organizing knowledge objects, - Spatial information provides concrete logic for organizing spatial objects (utilized in, e.g., self-driving cars). In this way, our brains know how to organize their knowledge and spatial information. ...

Downloads: 0 This Week

Last Update: 2024-11-09
See Project
17

Exposure Correction

Learning multi-scale deep model correcting over- and under- exposed

...The repository focuses on correcting poorly exposed photographs, handling both underexposure and overexposure using a deep learning approach. The method employs a multi-scale framework that learns to enhance images by adjusting exposure levels across different spatial resolutions. This allows the model to preserve fine details while correcting global lighting inconsistencies. The repository includes pre-trained models, datasets, and training/testing code to enable reproducibility and experimentation. By leveraging this framework, researchers and developers can apply exposure correction to a wide range of natural images, improving visual quality without manual editing. ...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
18

ControlNet

Let us control diffusion models

ControlNet is a neural network architecture designed to add conditional control to text-to-image diffusion models. Rather than training from scratch, ControlNet “locks” the weights of a pre-trained diffusion model and introduces a parallel trainable branch that learns additional conditions—like edges, depth maps, segmentation, human pose, scribbles, or other guidance signals. This allows the system to control where and how the model should focus during generation, enabling users to steer...

Downloads: 2 This Week

Last Update: 2025-10-21
See Project
19

Recast Navigation

Navigation mesh generation and pathfinding toolkit for game AI systems

...Its core component, Recast, constructs navigation meshes by rasterizing triangle meshes into voxels, filtering out areas that are not walkable, and converting the remaining regions into polygon meshes suitable for navigation. Detour complements this system by providing runtime pathfinding, spatial queries, and navigation utilities that allow agents to move efficiently across the generated mesh. Recast Navigation also includes modules such as DetourCrowd for crowd simulation and agent collision avoidance, as well as DetourTileCache for streaming and dynamically updating navigation meshes in large or changing environments.

Downloads: 1 This Week

Last Update: 2026-03-13
See Project
20

Mask2Former

Code release for "Masked-attention Mask Transformer

...Its core idea is to cast segmentation as mask classification: a transformer decoder predicts a set of mask queries, each with an associated class score, eliminating the need for task-specific heads. A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial support. This leads to accurate masks with sharp boundaries and strong small-object performance while remaining efficient on high-resolution inputs. The project provides extensive configurations and pretrained models across popular benchmarks like COCO, ADE20K, and Cityscapes. Built on top of Detectron2, it includes training scripts, inference tools, and visualization utilities that make experimentation straightforward.

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
21

TimeSformer

The official pytorch implementation of our paper

...TimeSformer was influential in showing that pure transformer architectures—without convolutional backbones—can perform strongly on video classification tasks. Its flexible attention design allows experimenting with different factoring (spatial-then-temporal, joint, etc.) to trade off compute, memory, and accuracy.

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
22

Dynamic Routing Between Capsules

A PyTorch implementation of the NIPS 2017 paper

Dynamic Routing Between Capsules is a PyTorch implementation of the Capsule Network architecture originally proposed to address limitations in traditional convolutional neural networks. Capsule networks aim to improve how neural models represent spatial hierarchies and relationships between objects within images. Instead of scalar neuron activations, capsules output vectors that encode both the presence of features and their spatial properties such as orientation or pose. The repository implements the dynamic routing algorithm between capsules, which allows lower-level features to route their outputs to higher-level structures that best represent the detected patterns. ...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
23

SG2Im

Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

...The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. In practice, sg2im demonstrates how structured semantics can guide generative models to produce controllable, compositional imagery.

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
24

FastPhotoStyle

Style transfer, deep learning, feature transform

FastPhotoStyle is a deep learning-based image stylization framework designed to transfer the style of one photograph onto another while preserving photorealistic quality. Unlike traditional artistic style transfer methods that produce painterly outputs, this approach focuses on maintaining realistic textures, lighting, and spatial consistency. The method is based on a two-step process that includes a stylization phase followed by a smoothing operation, ensuring that the output image remains coherent and free of visual artifacts. It is computationally efficient due to its closed-form solution, allowing fast processing compared to iterative optimization-based methods. ...

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
25

SpatialML

SpatialML is a markup language for representing spatial expressions in natural language documents. The goal is to allow for better integration of text collections with resources such as databases that provide spatial information about a domain.

Downloads: 0 This Week

Last Update: 2013-04-26
See Project