realistic free download

Showing 12 open source projects for "realistic"

View related business solutions

AI Models Python Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

Hunyuan3D-2.1

From Images to High-Fidelity 3D Assets

Hunyuan3D-2.1 is Tencent Hunyuan’s advanced 3D asset generation system that produces high-fidelity 3D models with Physically Based Rendering (PBR) textures. It is fully open-source with released model weights, training, and inference code. It improves on prior versions by using a PBR texture pipeline (enabling realistic material effects like reflections and subsurface scattering) and allowing community fine-tuning and extension. It supports both shape generation (mesh geometry) and texture generation modules. Physically Based Rendering texture synthesis to model realistic material effects, including reflections, subsurface scattering, etc. Cross-platform support (MacOS, Windows, Linux) via Python / PyTorch, including diffusers-style APIs.

Downloads: 11 This Week

Last Update: 2025-10-17
See Project
2

Wan2.1

Wan2.1: Open and Advanced Large-Scale Video Generative Model

Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. ...

1 Review

Downloads: 62 This Week

Last Update: 2026-03-05
See Project
3

LingBot-World

Advancing Open-source World Models

LingBot-World is an open-source, high-fidelity world simulator designed to advance the state of world models through video generation. Built on top of Wan2.2, it enables realistic, dynamic environment simulation across diverse styles, including real-world, scientific, and stylized domains. LingBot-World supports long-term temporal consistency, maintaining coherent scenes and interactions over minute-level horizons. With real-time interactivity and sub-second latency at 16 FPS, it is well-suited for interactive applications and rapid experimentation. ...

Downloads: 4 This Week

Last Update: 2026-03-05
See Project
4

WorldGen

Generate Any 3D Scene in Seconds

...The core idea is that you describe a world in natural language and WorldGen produces a navigable 3D scene that you can freely explore in 360 degrees, with loop closure so that the space remains consistent as you move around. It supports a wide variety of scenes, including both indoor and outdoor settings, and can handle realistic as well as stylized or fantastical environments. Rendering is decoupled from generation, so you can render at arbitrary resolutions and camera trajectories in real time, which makes it easier to integrate into custom pipelines.

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Kitten TTS

State-of-the-art TTS model under 25MB

KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.

Downloads: 14 This Week

Last Update: 2026-02-24
See Project
6

VibeVoice

Open-source multi-speaker long-form text-to-speech model

...A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. The model integrates a Qwen2.5-based large language model with a diffusion head to produce realistic acoustic details and capture conversational context. Training involved curriculum learning with increasing sequence lengths up to 65K tokens, allowing VibeVoice to handle very long dialogues effectively. Safety mechanisms include an audible disclaimer and imperceptible watermarking in all generated audio to mitigate misuse risks.

Downloads: 7 This Week

Last Update: 3 days ago
See Project
7

Stable Virtual Camera

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Stable Virtual Camera is a multi-view diffusion model developed by Stability AI that transforms 2D images into immersive 3D videos with realistic depth and perspective. Unlike traditional methods that require complex reconstruction or scene-specific optimization, this model allows users to generate novel views from any number of input images and define custom camera trajectories, enabling dynamic exploration of scenes. It supports various aspect ratios and can produce 3D-consistent videos up to 1,000 frames, making it a versatile tool for creators seeking to enhance visual storytelling.

Downloads: 2 This Week

Last Update: 2025-03-20
See Project
8

LongCat-Image

Foundation model for image generation

LongCat-Image is an open-source foundation model for image generation and editing created by the LongCat team at Meituan, designed to deliver high-quality visual outputs while remaining efficient and accessible for developers and researchers. Rather than relying on massive parameter counts typical of many cutting-edge models, LongCat-Image achieves strong photorealism, stable structure, and accurate bilingual (Chinese and English) text rendering with a more compact ~6-billion parameter...

Downloads: 0 This Week

Last Update: 2026-03-03
See Project
9

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

...It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. Hybrid architecture combining multimodal transformer blocks and unimodal refinement blocks. Temporal alignment via frame-level synchronization modules (e.g. Synchformer).

Downloads: 0 This Week

Last Update: 2025-09-28
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

CSM (Conversational Speech Model)

A Conversational Speech Generation Model

The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Downloads: 4 This Week

Last Update: 2025-03-19
See Project
11

SG2Im

Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

...Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. ...

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
12

Dia-1.6B

Dia-1.6B generates lifelike English dialogue and vocal expressions

Dia-1.6B is a 1.6 billion parameter text-to-speech model by Nari Labs that generates high-fidelity dialogue directly from transcripts. Designed for realistic vocal performance, Dia supports expressive features like emotion, tone control, and non-verbal cues such as laughter, coughing, or sighs. The model accepts speaker conditioning through audio prompts, allowing limited voice cloning and speaker consistency across generations. It is optimized for English and built for real-time performance on enterprise GPUs, though CPU and quantized versions are planned. ...

Downloads: 0 This Week

Last Update: 2025-06-27
See Project