frame generation free download

Showing 9 open source projects for "frame generation"

View related business solutions

AI Models Clear Filters & Widen Search

Stop vibe-debugging.
Plug Claude into your app's actual errors.

AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.

Free 30 days.
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
1

Oasis

Inference script for Oasis 500M

...Instead of rendering a pre-built game world, the system produces the next visual state via a diffusion-transformer approach, effectively “imagining” the world response to your actions in real time. The project focuses on enabling action-conditional frame generation so developers can experiment with interactive, model-generated environments rather than static video generation alone. Because it’s an inference-focused repository, it’s especially useful as a practical reference for running the model, wiring inputs, and producing the autoregressive sequence of gameplay frames. It also serves as a research sandbox for people exploring how far interactive generative models can go with smaller, more accessible checkpoints compared to massive internal systems.

Downloads: 1 This Week

Last Update: 2026-01-06
See Project
2

LTX-2.3

Official Python inference and LoRA trainer package

LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while...

Downloads: 102 This Week

Last Update: 2026-06-17
See Project
3

HunyuanVideo-I2V

A Customizable Image-to-Video Model based on HunyuanVideo

HunyuanVideo-I2V is a customizable image-to-video generation framework from Tencent Hunyuan, built on their HunyuanVideo foundation. It extends video generation so that given a static reference image plus an optional prompt, it generates a video sequence that preserves the reference image’s identity (especially in the first frame) and allows stylized effects via LoRA adapters. The repository includes pretrained weights, inference and sampling scripts, training code for LoRA effects, and support for parallel inference via xDiT. ...

Downloads: 0 This Week

Last Update: 2026-04-07
See Project
4

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

...Hybrid architecture combining multimodal transformer blocks and unimodal refinement blocks. Temporal alignment via frame-level synchronization modules (e.g. Synchformer).

Downloads: 0 This Week

Last Update: 2025-09-28
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

Qwen-2.5-VL

Qwen2.5-VL is the multimodal large language model series

Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...

Downloads: 7 This Week

Last Update: 2026-01-30
See Project
6

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

...It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. ...

Downloads: 1 This Week

Last Update: 2026-06-02
See Project
7

Qwen2.5-VL-3B-Instruct

Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video

Qwen2.5-VL-3B-Instruct is a 3.75 billion parameter multimodal model by Qwen, designed to handle complex vision-language tasks in both image and video formats. As part of the Qwen2.5 series, it supports image-text-to-text generation with capabilities like chart reading, object localization, and structured data extraction. The model can serve as an intelligent visual agent capable of interacting with digital interfaces and understanding long-form videos by dynamically sampling resolution and frame rate. It uses a SwiGLU and RMSNorm-enhanced ViT architecture and introduces mRoPE updates for robust temporal and spatial understanding. ...

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
8

Qwen2.5-VL-7B-Instruct

Multimodal 7B model for image, video, and text understanding tasks

...Fine-tuned from Qwen2.5-VL, this 7-billion-parameter model can interpret visual content such as charts, documents, and user interfaces, as well as recognize common objects. It supports complex tasks like visual question answering, localization with bounding boxes, and structured output generation from documents. The model is also capable of video understanding with dynamic frame sampling and temporal reasoning, enabling it to analyze and respond to long-form videos. Built with an enhanced ViT architecture using window attention, SwiGLU, and RMSNorm, it aligns closely with Qwen2.5 LLM standards. The model demonstrates high performance across benchmarks like DocVQA, ChartQA, and MMStar, and even functions as a tool-using visual agent.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
9

Gemma 4

Google’s flagship dense multimodal model for coding and reasoning

Gemma 4 is Google DeepMind’s flagship dense open-weight multimodal model, designed for high-end reasoning, coding, agentic workflows, and multimodal understanding. The model contains approximately 30.7B parameters and supports text and image inputs with text generation output, while also processing video as image-frame sequences. Built as the most capable model in the Gemma 4 family, it combines strong reasoning performance with a large 256K-token context window and configurable thinking modes. Gemma 4 31B supports native function calling, structured outputs, and more than 140 languages, making it suitable for enterprise assistants, coding agents, document analysis, and multilingual applications. ...

Downloads: 0 This Week

Last Update: 2026-06-03
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now