37 projects for "maps" with 2 filters applied:

  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. It separates client-side media handling from backend AI processing, reducing data exposure while still enabling transcription and document generation. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    ...Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. It includes support for high-resolution inputs and post-processing tools that refine depth predictions, helping downstream tasks like segmentation, bounding volume estimation, and mixed reality layering.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    JimuReport

    JimuReport

    Open source drag-and-drop reporting and dashboard builder platform

    JimuReport is an open source data visualization and reporting platform designed to help developers and organizations build reports, dashboards, and large screen data displays through a visual interface. It provides an online report designer that uses an Excel-like editing experience, allowing users to construct reports with drag-and-drop components and cell-based layouts. It focuses on simplifying complex report development by enabling visual configuration instead of manual coding....
    Downloads: 4 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    improve

    improve

    Use your most capable model to audit your codebase

    ...The skill does not modify code by default, because its main output is a self-contained plan. It can run full, quick, deep, security-focused, branch-scoped, and feature-suggestion audits. It maps repository conventions, identifies findings, writes ordered markdown plans, and records verification steps such as lint, build, and test commands. It is useful for teams that want AI-assisted technical planning without letting the planning model directly change production code.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Smart Excalidraw

    Smart Excalidraw

    A smart, powerful, and beautiful excalidraw drawing tool

    ...It leverages large language models to interpret user input and automatically produce structured diagrams such as flowcharts, architecture diagrams, ER diagrams, and mind maps with logical layouts and clean visual organization. One of its key innovations is a smart connection algorithm that optimizes how elements are linked, reducing visual clutter and ensuring clarity in complex diagrams. The tool integrates seamlessly with the Excalidraw format, allowing users to refine, edit, and customize AI-generated diagrams manually on an interactive canvas. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    pix2pixHD

    pix2pixHD

    Synthesizing and manipulating 2048x1024 images with conditional GANs

    pix2pixHD is a PyTorch-based implementation of a conditional generative adversarial network designed for high-resolution image-to-image translation, capable of producing photorealistic outputs at resolutions up to 2048×1024. It is widely used to convert structured inputs such as semantic label maps into realistic images, making it particularly valuable in applications like autonomous driving simulation, face synthesis, and scene generation. The model improves upon earlier GAN approaches by introducing multi-scale generators and discriminators that enable stable training and fine detail generation at large resolutions. It also supports interactive editing, allowing users to modify semantic regions and regenerate images with realistic adjustments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    i.am.ai

    i.am.ai

    Roadmap to becoming an Artificial Intelligence Expert in 2022

    i.am.ai is a structured educational guide that maps out the knowledge areas and technologies required to become an artificial intelligence or machine learning expert. The project presents visual charts that outline multiple career paths such as data scientist, machine learning engineer, and AI specialist, helping learners understand what to study and in what order. It was originally created to train internal employees but was released publicly to support the broader community. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 10
    Minigrid

    Minigrid

    Simple and easily configurable grid world environments

    ...It provides a suite of simple 2D grid-based tasks (e.g., navigating mazes, unlocking doors, carrying keys) where an agent moves in discrete steps and interacts with objects. The design emphasizes speed (agents can run thousands of steps per second), low dependency overhead, and high customizability — making it easy to define new maps, new tasks, or wrappers. It supports the Gymnasium-style environment API so that RL researchers can plug it into their existing frameworks and algorithms with minimal adaptation. Because of its simplicity, it is often used for rapid prototyping, analytic experiments, curriculum learning, or pedagogical tutorials. While it is not a full 3D simulation environment, its strength lies in enabling many environment resets and steps cheaply, which is valuable for algorithmic RL research rather than high-fidelity rendering.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Hunyuan3D-1

    Hunyuan3D-1

    A Unified Framework for Text-to-3D and Image-to-3D Generation

    Hunyuan3D-1 is an earlier version in the same 3D generation line (the unified framework for text-to-3D and image-to-3D tasks) by Tencent Hunyuan. It provides a framework combining shape generation and texture synthesis, enabling users to create 3D assets from images or text conditions. While less advanced than version 2.1, it laid the foundations for the later PBR, higher resolution, and open-source enhancements. (Note: less detailed public documentation was found for Hunyuan3D-1 compared to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    A2UI

    A2UI

    A Protocol for Agent-Driven Interfaces

    ...This approach separates UI intent from UI implementation, making it possible for the same agent-generated interface to be rendered across different platforms such as web, mobile, and desktop applications. A key design principle of A2UI is security, as it avoids executing arbitrary code generated by models and instead restricts output to structured data that maps to a predefined catalog of trusted UI components. The system also supports incremental updates, allowing agents to progressively modify the interface as a conversation evolves.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    WFGY 3.0

    WFGY 3.0

    A tension reasoning engine over 131 S-class problems

    ...Different versions of the framework, including WFGY 1.0, 2.0, and 3.0, represent stages of development where early conceptual ideas evolved into more structured reasoning engines and diagnostic tools. The system maps reasoning tension across a large set of complex problems spanning domains such as mathematics, science, climate, finance, and artificial intelligence behavior.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Unla

    Unla

    Gateway service that instantly transforms existing MCP Servers

    ...Its goal is to let teams “wire up” tools they already run—internal REST endpoints, third-party APIs, or local MCP servers—and present a single, reliable MCP interface to clients like Claude Desktop, Cursor, and IDEs. The gateway focuses on operational concerns you’d expect in production: multi-instance availability, health checking, and declarative routing that maps upstreams to MCP tools and resources. A quick-start and CLI make it easy to stand up an API server, while the package structure exposes helpers for people who want to embed or extend the gateway. Because it is itself MCP-speaking, Unla can sit in front of mixed fleets and normalize transports and schemas for clients. Documentation and pkg.go.dev pages reinforce the positioning as a stable, Go-native building block for MCP deployments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Map-Anything

    Map-Anything

    MapAnything: Universal Feed-Forward Metric 3D Reconstruction

    ...The model flexibly accepts different input combinations (images, intrinsics, poses, sparse or dense depth) and produces a rich set of outputs including per-pixel 3D points, camera intrinsics, camera poses, ray directions, confidence maps, and validity masks. Its inference path is fully feed-forward with optional mixed-precision and memory-efficient modes, making it practical to scale to long image sequences while keeping latency predictable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Audio AI Timeline

    Audio AI Timeline

    A timeline of the latest AI models for audio generation

    Audio AI Timeline is a curated project that organizes the development of audio-related artificial intelligence into a structured and accessible historical timeline. Rather than functioning as a model training framework, it serves as an informational resource that maps key papers, systems, models, datasets, and milestones across areas such as speech synthesis, music generation, audio understanding, source separation, and general audio machine learning. The project helps users understand how major techniques and ideas evolved over time, making it especially useful for researchers, students, and practitioners who want a broad overview of the field without digging through scattered references. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Shap-E

    Shap-E

    Generate 3D objects conditioned on text or images

    The shap-e repository provides the official code and model release for Shap-E, a conditional generative model designed to produce 3D assets (implicit functions, meshes, neural radiance fields) from text or image prompts. The model is built with a two-stage architecture: first an encoder that maps existing 3D assets into parameterizations of implicit functions, and then a conditional diffusion model trained on those parameterizations to generate new assets. Because it works at the level of implicit functions, Shap-E can render output both as textured meshes and NeRF-style volumetric renderings. The repository contains sample notebooks (e.g. sample_text_to_3d.ipynb, sample_image_to_3d.ipynb) so users can try out text → 3D or image → 3D generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    ControlNet

    ControlNet

    Let us control diffusion models

    ...Rather than training from scratch, ControlNet “locks” the weights of a pre-trained diffusion model and introduces a parallel trainable branch that learns additional conditions—like edges, depth maps, segmentation, human pose, scribbles, or other guidance signals. This allows the system to control where and how the model should focus during generation, enabling users to steer layout, structure, and content more precisely than prompt text alone. The project includes many trained model variants that accept different types of conditioning (e.g., canny edge input, normal maps, skeletal pose) and produce improved fidelity in stable diffusion outputs. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    PIFuHD

    PIFuHD

    High-Resolution 3D Human Digitization from A Single Image

    ...It also uses a two-stage architecture: a coarse global model followed by local refinement patches to capture fine detail, balancing global consistency and local detail. The repo includes training pipelines, dataset loaders (for Multi-POP, etc.), and inference scripts for mesh output including depth maps for postprocessing. To help practical use, there are utilities for normal estimation, texture back-projection, mesh cleanup, and integration with rendering pipelines.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    CAM

    CAM

    Class Activation Mapping

    This repository implements Class Activation Mapping (CAM), a technique to expose the implicit attention of convolutional neural networks by generating heatmaps that highlight the most discriminative image regions influencing a network’s class prediction. The method involves modifying a CNN model slightly (e.g., using global average pooling before the final layer) to produce a weighted combination of feature maps as the class activation map. Integration with existing CNNs (with light modifications). Sample scripts/examples using standard architectures. The repo provides example code and instructions for applying CAM to existing CNN architectures. Visualization of discriminative regions per class.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    VITS

    VITS

    Conditional Variational Autoencoder with Adversarial Learning

    VITS is a foundational research implementation of “VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech,” a well-known neural TTS architecture. Unlike traditional two-stage systems that separately train an acoustic model and a vocoder, VITS trains an end-to-end model that maps text directly to waveform using a conditional variational autoencoder combined with normalizing flows and adversarial training. This architecture enables parallel generation (fast inference) while achieving speech quality that rivals or surpasses many two-stage systems. The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    DensePose

    DensePose

    A real-time approach for mapping all human pixels of 2D RGB images

    DensePose is a computer vision system that maps all human pixels in an RGB image to the 3D surface of a human body model. It extends human pose estimation from predicting joint keypoints to providing dense correspondences between 2D images and a canonical 3D mesh (such as the SMPL model). This enables detailed understanding of human shape, motion, and surface appearance directly from images or videos.
    Downloads: 45 This Week
    Last Update:
    See Project
  • 23
    ALAE

    ALAE

    Adversarial Latent Autoencoders

    ...The project implements the architecture introduced in the CVPR research paper on Adversarial Latent Autoencoders, which focuses on improving generative modeling by learning latent representations aligned with adversarial training objectives. Unlike traditional GANs that directly generate images from random noise, ALAE uses an encoder-decoder architecture that maps images into a structured latent space and then reconstructs them through adversarial training. This design allows the model to learn interpretable latent representations that can be manipulated to control generated image attributes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    vid2vid

    vid2vid

    Pytorch implementation of our method for high-resolution

    vid2vid is a deep learning framework for high-resolution video-to-video translation that generates photorealistic videos from structured inputs such as semantic maps, pose sequences, or edge maps. Built on top of image-to-image translation techniques like pix2pixHD, it extends these ideas into the temporal domain by ensuring consistency across video frames. The system can synthesize complex outputs such as realistic talking faces, human motion animations, or dynamic street scenes by learning temporal relationships between frames. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    sg2im is a research codebase that learns to synthesize images from scene graphs—structured descriptions of objects and their relationships. Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts....
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo