Implementation of NÜWA, state of the art attention network for text-to-video synthesis, in Pytorch. It also contains an extension into video and audio generation, using a dual decoder approach. It seems as though a diffusion-based method has taken the new throne for SOTA. However, I will continue on with NUWA, extending it to use multi-headed codes + hierarchical causal transformer. I think that direction is untapped for improving on this line of work. In the paper, they also present a way to condition the video generation based on segmentation mask(s). You can easily do this as well, given you train a VQGanVAE on the sketches beforehand. Then, you will use NUWASketch instead of NUWA, which can accept the sketch VAE as a reference. This repository will also offer a variant of NUWA that can produce both video and audio. For now, the audio will need to be encoded manually.

Features

  • Train the VAE
  • Conditioning on Sketches
  • Text to video and audio
  • Te audio will need to be encoded manually
  • This library will offer some utilities to make training easier
  • This library depends on this vector quantization library

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow NÜWA - Pytorch

NÜWA - Pytorch Web Site

Other Useful Business Software
AI-powered service management for IT and enterprise teams Icon
AI-powered service management for IT and enterprise teams

Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Try it Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of NÜWA - Pytorch!

Additional Project Details

Programming Language

Python

Related Categories

Python AI Video Generators, Python Generative AI

Registered

2023-03-22