Download Latest Version v0.35.1 for improvements in Qwen-Image Edit source code.tar.gz (9.0 MB)
Email in envelope

Get an email when there's a new version of Diffusers

Home / v0.35.0
Name Modified Size InfoDownloads / Week
Parent folder
Diffusers 0.35.0_ Qwen Image pipelines, Flux Kontext, Wan 2.2, and more source code.tar.gz 2025-08-19 9.0 MB
Diffusers 0.35.0_ Qwen Image pipelines, Flux Kontext, Wan 2.2, and more source code.zip 2025-08-19 11.4 MB
README.md 2025-08-19 20.2 kB
Totals: 3 Items   20.4 MB 0

This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.

New pipelines 🧨

We welcomed new pipelines in this release:

  • Wan 2.2
  • Flux-Kontext
  • Qwen-Image
  • Qwen-Image-Edit

Wan 2.2 📹

This update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the official doc to learn more.

Flux-Kontext 🎇

Flux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the official doc to learn more about it.

Qwen-Image 🌅

After a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our docs.

Thanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via this PR and this PR.

New training scripts 🎛️

Make these newly added models your own with our training scripts:

Single-file modeling implementations

Following the 🤗 Transformers’ philosophy of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer code is one example of this.

Attention refactor

We have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native scaled_dot_product_attention, Flash Attention 3, SAGE attention, etc.) in the library seamlessly.

Having attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow this PR if you’re interested.

Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.

Regional compilation

Regional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to this doc to learn more.

Thanks to @anijain2305 for contributing this feature in this PR.

We have also authored a number of posts that center around the use of torch.compile. You can check them out at the links below:

Faster pipeline loading ⚡️

Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.

:::diff
from diffusers import DiffusionPipeline
import torch

ckpt_id = "Qwen/Qwen-Image"
pipe = DiffusionPipeline.from_pretrained(
-    ckpt_id, torch_dtype=torch.bfloat16
- ).to("cuda")
+    ckpt_id, torch_dtype=torch.bfloat16, device_map="cuda"
+ )

You can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files.

:::python
import os
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "yes"

# rest of the loading code
....

Better GGUF integration

@Isotr0py contributed support for native GGUF CUDA kernels in this PR. This should provide an approximately 10% improvement in inference speed.

We have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more here.

We now support loading of Diffusers format GGUF checkpoints.

You can learn more about all of this in our GGUF official docs.

Modular Diffusers (Experimental)

Modular Diffusers is a system for building diffusion pipelines pipelines with individual pipeline blocks. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.

The API is currently in active development and is being released as an experimental feature. Learn more in our docs.

All commits

  • [tests] skip instead of returning. by @sayakpaul in [#11793]
  • adjust to get CI test cases passed on XPU by @kaixuanliu in [#11759]
  • fix deprecation in lora after 0.34.0 release by @sayakpaul in [#11802]
  • [chore] post release v0.34.0 by @sayakpaul in [#11800]
  • Follow up for Group Offload to Disk by @DN6 in [#11760]
  • [rfc][compile] compile method for DiffusionPipeline by @anijain2305 in [#11705]
  • [tests] add a test on torch compile for varied resolutions by @sayakpaul in [#11776]
  • adjust tolerance criteria for test_float16_inference in unit test by @kaixuanliu in [#11809]
  • Flux Kontext by @a-r-r-o-w in [#11812]
  • Kontext training by @sayakpaul in [#11813]
  • Kontext fixes by @a-r-r-o-w in [#11815]
  • remove syncs before denoising in Kontext by @sayakpaul in [#11818]
  • [CI] disable onnx, mps, flax from the CI by @sayakpaul in [#11803]
  • TorchAO compile + offloading tests by @a-r-r-o-w in [#11697]
  • Support dynamically loading/unloading loras with group offloading by @a-r-r-o-w in [#11804]
  • [lora] fix: lora unloading behvaiour by @sayakpaul in [#11822]
  • [lora]feat: use exclude modules to loraconfig. by @sayakpaul in [#11806]
  • ENH: Improve speed of function expanding LoRA scales by @BenjaminBossan in [#11834]
  • Remove print statement in SCM Scheduler by @a-r-r-o-w in [#11836]
  • [tests] add test for hotswapping + compilation on resolution changes by @sayakpaul in [#11825]
  • reset deterministic in tearDownClass by @jiqing-feng in [#11785]
  • [tests] Fix failing float16 cuda tests by @a-r-r-o-w in [#11835]
  • [single file] Cosmos by @a-r-r-o-w in [#11801]
  • [docs] fix single_file example. by @sayakpaul in [#11847]
  • Use real-valued instead of complex tensors in Wan2.1 RoPE by @mjkvaak-amd in [#11649]
  • [docs] Batch generation by @stevhliu in [#11841]
  • [docs] Deprecated pipelines by @stevhliu in [#11838]
  • fix norm not training in train_control_lora_flux.py by @Luo-Yihang in [#11832]
  • [From Single File] support from_single_file method for WanVACE3DTransformer by @J4BEZ in [#11807]
  • [lora] tests for exclude_modules with Wan VACE by @sayakpaul in [#11843]
  • update: FluxKontextInpaintPipeline support by @vuongminh1907 in [#11820]
  • [Flux Kontext] Support Fal Kontext LoRA by @linoytsaban in [#11823]
  • [docs] Add a note of _keep_in_fp32_modules by @a-r-r-o-w in [#11851]
  • [benchmarks] overhaul benchmarks by @sayakpaul in [#11565]
  • FIX set_lora_device when target layers differ by @BenjaminBossan in [#11844]
  • Fix Wan AccVideo/CausVid fuse_lora by @a-r-r-o-w in [#11856]
  • [chore] deprecate blip controlnet pipeline. by @sayakpaul in [#11877]
  • [docs] fix references in flux pipelines. by @sayakpaul in [#11857]
  • [tests] remove tests for deprecated pipelines. by @sayakpaul in [#11879]
  • [docs] LoRA metadata by @stevhliu in [#11848]
  • [training ] add Kontext i2i training by @sayakpaul in [#11858]
  • [CI] Fix big GPU test marker by @DN6 in [#11786]
  • First Block Cache by @a-r-r-o-w in [#11180]
  • [tests] annotate compilation test classes with bnb by @sayakpaul in [#11715]
  • Update chroma.md by @shm4r7 in [#11891]
  • [CI] Speed up GPU PR Tests by @DN6 in [#11887]
  • Pin k-diffusion for CI by @sayakpaul in [#11894]
  • [Docker] update doc builder dockerfile to include quant libs. by @sayakpaul in [#11728]
  • [tests] Remove more deprecated tests by @sayakpaul in [#11895]
  • [tests] mark the wanvace lora tester flaky by @sayakpaul in [#11883]
  • [tests] add compile + offload tests for GGUF. by @sayakpaul in [#11740]
  • feat: add multiple input image support in Flux Kontext by @Net-Mist in [#11880]
  • Fix unique memory address when doing group-offloading with disk by @sayakpaul in [#11767]
  • [SD3] CFG Cutoff fix and official callback by @asomoza in [#11890]
  • The Modular Diffusers by @yiyixuxu in [#9672]
  • [quant] QoL improvements for pipeline-level quant config by @sayakpaul in [#11876]
  • Bump torch from 2.4.1 to 2.7.0 in /examples/server by @dependabot[bot] in [#11429]
  • [LoRA] fix: disabling hooks when loading loras. by @sayakpaul in [#11896]
  • [utils] account for MPS when available in get_device(). by @sayakpaul in [#11905]
  • [ControlnetUnion] Multiple Fixes by @asomoza in [#11888]
  • Avoid creating tensor in CosmosAttnProcessor2_0 by @chenxiao111222 in [#11761])
  • [tests] Unify compilation + offloading tests in quantization by @sayakpaul in [#11910]
  • Speedup model loading by 4-5x ⚡ by @a-r-r-o-w in [#11904]
  • [docs] torch.compile blog post by @stevhliu in [#11837]
  • Flux: pass joint_attention_kwargs when using gradient_checkpointing by @piercus in [#11814]
  • Fix: Align VAE processing in ControlNet SD3 training with inference by @Henry-Bi in [#11909]
  • Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server by @dependabot[bot] in [#11924]
  • [tests] Improve Flux tests by @a-r-r-o-w in [#11919]
  • Remove device synchronization when loading weights by @a-r-r-o-w in [#11927]
  • Remove forced float64 from onnx stable diffusion pipelines by @lostdisc in [#11054]
  • Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel by @lengmo1996 in [#11923]
  • [ControlnetUnion] Propagate [#11888] to img2img by @asomoza in [#11929]
  • enable flux pipeline compatible with unipc and dpm-solver by @gameofdimension in [#11908]
  • [training] add an offload utility that can be used as a context manager. by @sayakpaul in [#11775]
  • Add SkyReels V2: Infinite-Length Film Generative Model by @tolgacangoz in [#11518]
  • [refactor] Flux/Chroma single file implementation + Attention Dispatcher by @a-r-r-o-w in [#11916]
  • [docs] clarify the mapping between Transformer2DModel and finegrained variants. by @sayakpaul in [#11947]
  • [Modular] Updates for Custom Pipeline Blocks by @DN6 in [#11940]
  • [docs] Update toctree by @stevhliu in [#11936]
  • [docs] include bp link. by @sayakpaul in [#11952]
  • Fix kontext finetune issue when batch size >1 by @mymusise in [#11921]
  • [tests] Add test slices for Hunyuan Video by @a-r-r-o-w in [#11954]
  • [tests] Add test slices for Cosmos by @a-r-r-o-w in [#11955]
  • [tests] Add fast test slices for HiDream-Image by @a-r-r-o-w in [#11953]
  • [Modular] update the collection behavior by @yiyixuxu in [#11963]
  • fix "Expected all tensors to be on the same device, but found at least two devices" error by @yao-matrix in [#11690]
  • Remove logger warnings for attention backends and hard error during runtime instead by @a-r-r-o-w in [#11967]
  • [Examples] Uniform notations in train_flux_lora by @tomguluson92 in [#10011]
  • fix style by @yiyixuxu in [#11975]
  • [tests] Add test slices for Wan by @a-r-r-o-w in [#11920]
  • [docs] update guidance_scale docstring for guidance_distilled models. by @sayakpaul in [#11935]
  • [tests] enforce torch version in the compilation tests. by @sayakpaul in [#11979]
  • [modular diffusers] Wan by @a-r-r-o-w in [#11913]
  • [compile] logger statements create unnecessary guards during dynamo tracing by @a-r-r-o-w in [#11987]
  • enable quantcompile test on xpu by @yao-matrix in [#11988]
  • [WIP] Wan2.2 by @yiyixuxu in [#12004]
  • [refactor] some shared parts between hooks + docs by @a-r-r-o-w in [#11968]
  • [refactor] Wan single file implementation by @a-r-r-o-w in [#11918]
  • Fix huggingface-hub failing tests by @asomoza in [#11994]
  • feat: add flux kontext by @jlonge4 in [#11985]
  • [modular] add Modular flux for text-to-image by @sayakpaul in [#11995]
  • [docs] include lora fast post. by @sayakpaul in [#11993]
  • [docs] quant_kwargs by @stevhliu in [#11712]
  • [docs] Fix link by @stevhliu in [#12018]
  • [wan2.2] add 5b i2v by @yiyixuxu in [#12006]
  • wan2.2 i2v FirstBlockCache fix by @okaris in [#12013]
  • [core] support attention backends for LTX by @sayakpaul in [#12021]
  • [docs] Update index by @stevhliu in [#12020]
  • [Fix] huggingface-cli to hf missed files by @asomoza in [#12008]
  • [training-scripts] Make pytorch examples UV-compatible by @sayakpaul in [#12000]
  • [wan2.2] fix vae patches by @yiyixuxu in [#12041]
  • Allow SD pipeline to use newer schedulers, eg: FlowMatch by @ppbrown in [#12015]
  • [LoRA] support lightx2v lora in wan by @sayakpaul in [#12040]
  • Fix type of force_upcast to bool by @BerndDoser in [#12046]
  • Update autoencoder_kl_cosmos.py by @tanuj-rai in [#12045]
  • Qwen-Image by @naykun in [#12055]
  • [wan2.2] follow-up by @yiyixuxu in [#12024]
  • tests + minor refactor for QwenImage by @a-r-r-o-w in [#12057]
  • Cross attention module to Wan Attention by @samuelt0 in [#12058]
  • fix(qwen-image): update vae license by @naykun in [#12063]
  • CI fixing by @paulinebm in [#12059]
  • enable all gpus when running ci. by @sayakpaul in [#12062]
  • fix the rest for all GPUs in CI by @sayakpaul in [#12064]
  • [docs] Install by @stevhliu in [#12026]
  • [wip] feat: support lora in qwen image and training script by @sayakpaul in [#12056]
  • [docs] small corrections to the example in the Qwen docs by @sayakpaul in [#12068]
  • [tests] Fix Qwen test_inference slices by @a-r-r-o-w in [#12070]
  • [tests] deal with the failing AudioLDM2 tests by @sayakpaul in [#12069]
  • optimize QwenImagePipeline to reduce unnecessary CUDA synchronization by @chengzeyi in [#12072]
  • Add cuda kernel support for GGUF inference by @Isotr0py in [#11869]
  • fix input shape for WanGGUFTexttoVideoSingleFileTests by @jiqing-feng in [#12081]
  • [refactor] condense group offloading by @a-r-r-o-w in [#11990]
  • Fix group offloading synchronization bug for parameter-only GroupModule's by @a-r-r-o-w in [#12077]
  • Helper functions to return skip-layer compatible layers by @a-r-r-o-w in [#12048]
  • Make prompt_2 optional in Flux Pipelines by @DN6 in [#12073]
  • [tests] tighten compilation tests for quantization by @sayakpaul in [#12002]
  • Implement Frequency-Decoupled Guidance (FDG) as a Guider by @dg845 in [#11976]
  • fix flux type hint by @DefTruth in [#12089]
  • [qwen] device typo by @yiyixuxu in [#12099]
  • [lora] adapt new LoRA config injection method by @sayakpaul in [#11999]
  • lora_conversion_utils: replace lora up/down with a/b even if transformer. in key by @Beinsezii in [#12101]
  • [tests] device placement for non-denoiser components in group offloading LoRA tests by @sayakpaul in [#12103]
  • [Modular] Fast Tests by @yiyixuxu in [#11937]
  • [GGUF] feat: support loading diffusers format gguf checkpoints. by @sayakpaul in [#11684]
  • [docs] diffusers gguf checkpoints by @sayakpaul in [#12092]
  • [core] add modular support for Flux I2I by @sayakpaul in [#12086]
  • [lora] support loading loras from lightx2v/Qwen-Image-Lightning by @sayakpaul in [#12119]
  • [Modular] More Updates for Custom Code Loading by @DN6 in [#11969]
  • enable compilation in qwen image. by @sayakpaul in [#12061]
  • [tests] Add inference test slices for SD3 and remove unnecessary tests by @a-r-r-o-w in [#12106]
  • [chore] complete the licensing statement. by @sayakpaul in [#12001]
  • [docs] Cache link by @stevhliu in [#12105]
  • [Modular] Add experimental feature warning for Modular Diffusers by @DN6 in [#12127]
  • Add low_cpu_mem_usage option to from_single_file to align with from_pretrained by @IrisRainbowNeko in [#12114]
  • [docs] Modular diffusers by @stevhliu in [#11931]
  • [Bugfix] typo fix in NPU FA by @leisuzz in [#12129]
  • Add QwenImage Inpainting and Img2Img pipeline by @Trgtuan10 in [#12117]
  • [core] parallel loading of shards by @sayakpaul in [#12028]
  • try to use deepseek with an agent to auto i18n to zh by @SamYuan1990 in [#12032]
  • [docs] Refresh effective and efficient doc by @stevhliu in [#12134]
  • Fix bf15/fp16 for pipeline_wan_vace.py by @SlimRG in [#12143]
  • make parallel loading flag a part of constants. by @sayakpaul in [#12137]
  • [docs] Parallel loading of shards by @stevhliu in [#12135]
  • feat: cuda device_map for pipelines. by @sayakpaul in [#12122]
  • [core] respect local_files_only=True when using sharded checkpoints by @sayakpaul in [#12005]
  • support hf_quantizer in cache warmup. by @sayakpaul in [#12043]
  • make test_gguf all pass on xpu by @yao-matrix in [#12158]
  • [docs] Quickstart by @stevhliu in [#12128]
  • Qwen Image Edit Support by @naykun in [#12164]
  • remove silu for CogView4 by @lambertwjh in [#12150]
  • [qwen] Qwen image edit followups by @sayakpaul in [#12166]
  • Minor modification to support DC-AE-turbo by @chenjy2003 in [#12169]
  • [Docs] typo error in qwen image by @leisuzz in [#12144]
  • fix: caching allocator behaviour for quantization. by @sayakpaul in [#12172]
  • fix(training_utils): wrap device in list for DiffusionPipeline by @MengAiDev in [#12178]
  • [docs] Clarify guidance scale in Qwen pipelines by @sayakpaul in [#12181]
  • [LoRA] feat: support more Qwen LoRAs from the community. by @sayakpaul in [#12170]
  • Update README.md by @Taechai in [#12182]
  • [chore] add lora button to qwenimage docs by @sayakpaul in [#12183]
  • [Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora by @linoytsaban in [#12074]
  • Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @vuongminh1907
    • update: FluxKontextInpaintPipeline support (#11820)
  • @Net-Mist
    • feat: add multiple input image support in Flux Kontext (#11880)
  • @tolgacangoz
    • Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
  • @naykun
    • Qwen-Image (#12055)
    • fix(qwen-image): update vae license (#12063)
    • Qwen Image Edit Support (#12164)
  • @Trgtuan10
    • Add QwenImage Inpainting and Img2Img pipeline (#12117)
  • @SamYuan1990
    • try to use deepseek with an agent to auto i18n to zh (#12032)
Source: README.md, updated 2025-08-19