spatial free download

Showing 44 open source projects for "spatial"

View related business solutions

Artificial Intelligence Windows Clear Filters & Widen Search

$300 Free Credits to Build on Google Cloud
New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.

Claim $300 Free
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
1

TorchIO

Medical imaging toolkit for deep learning

TorchIO is an open-source Python library for efficient loading, preprocessing, augmentation and patch-based sampling of 3D medical images in deep learning, following the design of PyTorch. It includes multiple intensity and spatial transforms for data augmentation and preprocessing. These transforms include typical computer vision operations such as random affine transformations and also domain-specific ones such as simulation of intensity artifacts due to MRI magnetic field inhomogeneity (bias) or k-space motion artifacts. TorchIO is a Python package containing a set of tools to efficiently read, preprocess, sample, augment, and write 3D medical images in deep learning applications written in PyTorch, including intensity and spatial transforms for data augmentation and preprocessing. ...

Downloads: 1 This Week

Last Update: 2026-06-02
See Project
2

Autonomous Agents

Autonomous Agents (LLMs) research papers. Updated Daily

...The project explores how multiple agents can cooperate and interact with complex environments through machine learning, imitation learning, and multimodal sensing. It includes frameworks that integrate visual perception, tactile sensing, and spatial reasoning to guide the actions of robotic agents during manipulation or collaborative tasks. One of the central concepts explored in the repository is the integration of different sensory modalities using advanced machine learning techniques such as Feature-wise Linear Modulation and graph-based attention mechanisms. These methods allow agents to combine visual and geometric information while maintaining awareness of the spatial relationships between agents and objects.

Downloads: 0 This Week

Last Update: 2026-06-24
See Project
3

ML Ferret

Refer and Ground Anything Anywhere at Any Granularity

...The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
4

Qwen3-VL

Qwen3-VL, the multimodal large language model series by Alibaba Cloud

...Qwen3-VL is built for complex tasks such as GUI automation, multimodal coding (converting images or videos into HTML, CSS, JS, or Draw.io diagrams), long-context reasoning with support up to 1M tokens, and comprehensive video understanding. It also brings advanced perception capabilities, including spatial grounding, object recognition, OCR across 32 languages, and robust handling of challenging inputs like low-light or distorted text.

Downloads: 6 This Week

Last Update: 3 days ago
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
5

Lyra 2

Project Lyra: Open Generative 3D World Models

...It enables the creation of fully explorable 3D environments from minimal inputs such as a single image or video, leveraging self-distillation methods to generate consistent spatial representations. The system evolves across versions, with newer iterations introducing long-horizon generation and improved 3D consistency across frames. It combines elements of computer vision, generative modeling, and spatial intelligence to produce dynamic and navigable virtual worlds. The architecture is designed to handle both 3D and 4D scene generation, making it suitable for applications such as simulation, gaming, and virtual environments. ...

Downloads: 1 This Week

Last Update: 2026-06-11
See Project
6

VoxelMorph

Unsupervised Learning for Image Registration

VoxelMorph is an open-source deep learning framework designed for medical image registration, a process that aligns multiple medical scans into a common spatial coordinate system. Traditional image registration techniques typically rely on optimization procedures that must be executed separately for each pair of images, which can be computationally expensive and slow. VoxelMorph approaches the problem using neural networks that learn to predict deformation fields that transform one image so that it aligns with another. ...

Downloads: 10 This Week

Last Update: 2026-03-15
See Project
7

HDBSCAN

A high performance implementation of HDBSCAN clustering

HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection. In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning -- and the primary parameter, minimum cluster size, is intuitive and easy to select. ...

Downloads: 1 This Week

Last Update: 2026-06-01
See Project
8

SAM 3D Objects

Models for object and human mesh reconstruction

SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image systems struggle. ...

Downloads: 9 This Week

Last Update: 2026-06-02
See Project
9

DeepSeek-OCR 2

Visual Causal Flow

...It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.

Downloads: 1 This Week

Last Update: 2026-02-03
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
10

HY-World 1.5

A Systematic Framework for Interactive World Modeling

...It blends advanced reasoning with multimodal synthesis, enabling agents to describe scenes, generate context-appropriate responses, and contribute to narrative or gameplay flows. The underlying framework typically supports large-context state tracking across extended interactions, blending temporal and spatial multimodal signals.

Downloads: 2 This Week

Last Update: 2026-06-10
See Project
11

HeavyDB

HeavyDB (formerly MapD/OmniSciDB)

HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
12

Qwen-2.5-VL

Qwen2.5-VL is the multimodal large language model series

Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...

Downloads: 7 This Week

Last Update: 2026-01-30
See Project
13

SlowFast

Video understanding codebase from FAIR for reproducing video models

SlowFast is a video understanding framework that captures both spatial semantics and temporal dynamics efficiently by processing video frames at two different temporal resolutions. The slow pathway encodes semantic context by sampling frames sparsely, while the fast pathway captures motion and fine temporal cues by operating on densely sampled frames with fewer channels. Together, these two pathways complement each other, allowing the network to model both appearance and motion without excessive computational cost. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
14

49 Agents IDE

Open-source 2D IDE for managing AI agents in native CLIs

49Agents is an open-source “agentic IDE” that reimagines how developers interact with multiple AI agents, terminals, and development environments by placing everything onto a single infinite, zoomable canvas. Instead of relying on traditional tab-based workflows, it provides a spatial interface where terminals, editors, Git views, and monitoring tools coexist as movable panes, enabling users to manage complex multi-agent systems visually and intuitively. The platform is designed to work across multiple machines simultaneously, allowing agents running on different devices or servers to connect to a unified workspace without requiring SSH, which simplifies distributed development workflows. ...

Downloads: 0 This Week

Last Update: 2026-05-07
See Project
15

Claw3D

Claw3D is an open source 3D engine built on OpenClaw

...It is designed as a 3D “virtual office” where users can observe, manage, and interact with multiple AI agents performing tasks such as coding, reviewing pull requests, and coordinating workflows in real time. Instead of relying on traditional dashboards or logs, Claw3D introduces a spatial interface that allows users to navigate through a simulated office and watch agents collaborate, effectively turning abstract processes into tangible visual interactions. The system supports task assignment, progress tracking, and communication between agents, creating a representation of autonomous or semi-autonomous workflows. ...

Downloads: 0 This Week

Last Update: 2026-04-23
See Project
16

BuildingAI

Build your own AI application system for free

...The platform aims to bridge the gap between natural language interfaces and building design tools by allowing AI systems to interpret user instructions and convert them into structured architectural operations. By combining generative AI capabilities with building data models, the system can assist with tasks such as design generation, spatial reasoning, and building component creation. The project is intended for architects, engineers, and developers exploring how AI can automate or augment design workflows in the architecture, engineering, and construction industries. It supports interactions where users describe building features, layouts, or modifications in natural language and the AI translates those instructions into actionable design operations.

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
17

Seamless Communication

Foundational Models for State-of-the-Art Speech and Text Translation

Seamless Communication is a research project focused on building more integrated, low-latency multimodal communication between humans and AI agents. The motivation is to move beyond “text in, text out” and enable direct, live, multi-turn exchange involving language, gesture, gaze, vision, and modality switching without user friction. The system architecture includes a real-time multimodal signal pipeline for audio, video, and sensor data, a dialog manager that can decide when to act (speak,...

Downloads: 0 This Week

Last Update: 2025-10-06
See Project
18

LLaMA-Mesh

Unifying 3D Mesh Generation with Language Models

...As a result, the model can generate mesh models directly from text prompts, explain mesh structures in natural language, or output mixed text-and-mesh sequences. This unified representation enables a single model to operate across both textual and spatial domains.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
19

hCaptcha Challenger

Gracefully face hCaptcha challenge with multimodal llms

...Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. It implements an agent-style workflow where the system interprets the challenge prompt, selects the appropriate vision model, and generates the required interaction automatically.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
20

Qwen-Image-Layered

Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
21

Thinknowlogy

The world's only naturally intelligent knowledge technology

...This naturally occurring logic provides concrete clues for organizing natural objects, like: - Grouping objects that belong together, - Separating objects that don't belong together, - Archiving objects that have become less important. Natural language and spatial information are sources of natural intelligence: - Natural language is providing concrete logic for organizing knowledge objects, - Spatial information provides concrete logic for organizing spatial objects (utilized in, e.g., self-driving cars). In this way, our brains know how to organize their knowledge and spatial information. ...

Downloads: 0 This Week

Last Update: 2024-11-09
See Project
22

Exposure Correction

Learning multi-scale deep model correcting over- and under- exposed

...The repository focuses on correcting poorly exposed photographs, handling both underexposure and overexposure using a deep learning approach. The method employs a multi-scale framework that learns to enhance images by adjusting exposure levels across different spatial resolutions. This allows the model to preserve fine details while correcting global lighting inconsistencies. The repository includes pre-trained models, datasets, and training/testing code to enable reproducibility and experimentation. By leveraging this framework, researchers and developers can apply exposure correction to a wide range of natural images, improving visual quality without manual editing. ...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
23

ControlNet

Let us control diffusion models

ControlNet is a neural network architecture designed to add conditional control to text-to-image diffusion models. Rather than training from scratch, ControlNet “locks” the weights of a pre-trained diffusion model and introduces a parallel trainable branch that learns additional conditions—like edges, depth maps, segmentation, human pose, scribbles, or other guidance signals. This allows the system to control where and how the model should focus during generation, enabling users to steer...

Downloads: 2 This Week

Last Update: 2025-10-21
See Project
24

Recast Navigation

Navigation mesh generation and pathfinding toolkit for game AI systems

...Its core component, Recast, constructs navigation meshes by rasterizing triangle meshes into voxels, filtering out areas that are not walkable, and converting the remaining regions into polygon meshes suitable for navigation. Detour complements this system by providing runtime pathfinding, spatial queries, and navigation utilities that allow agents to move efficiently across the generated mesh. Recast Navigation also includes modules such as DetourCrowd for crowd simulation and agent collision avoidance, as well as DetourTileCache for streaming and dynamically updating navigation meshes in large or changing environments.

Downloads: 1 This Week

Last Update: 2026-03-13
See Project
25

BEVFormer

Implementation of BEVFormer, a camera-only framework

...In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. ...

Downloads: 0 This Week

Last Update: 2022-09-23
See Project