Download Latest Version v0.15.2 source code.tar.gz (7.6 MB)
Email in envelope

Get an email when there's a new version of PEFT

Home / v0.14.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2024-12-06 15.3 kB
Version 0.14.0_ EVA, Context-aware Prompt Tuning, Bone, and more source code.tar.gz 2024-12-06 7.5 MB
Version 0.14.0_ EVA, Context-aware Prompt Tuning, Bone, and more source code.zip 2024-12-06 7.8 MB
Totals: 3 Items   15.3 MB 0

Highlights

peft-v0 14 0

New Methods

Context-aware Prompt Tuning

@tsachiblau added a new soft prompt method called Context-aware Prompt Tuning (CPT) which is a combination of In-Context Learning and Prompt Tuning in the sense that, for each training sample, it builds a learnable context from training examples in addition to the single training sample. Allows for sample- and parameter-efficient few-shot classification and addresses recency-bias.

Explained Variance Adaption

@sirluk contributed a new LoRA initialization method called Explained Variance Adaption (EVA). Instead of randomly initializing LoRA weights, this method uses SVD of the base layer weights to initialize the LoRA weights and is also able to re-allocate the ranks of the adapter based on the explained variance ratio (derived from SVD). Thus, this initialization method can yield better initial values and better rank distribution.

Bone

@JL-er added an implementation for Block Affine (Bone) Adaption which utilizes presumed sparsity in the base layer weights to divide them into multiple sub-spaces that share a single low-rank matrix for updates. Compared to LoRA, Bone has the potential to significantly reduce memory usage and achieve faster computation.

Enhancements

PEFT now supports LoRAs for int8 torchao quantized models (check this and this notebook) . In addition, VeRA can now be used with 4 and 8 bit bitsandbytes quantization thanks to @ZiadHelal.

Hot-swapping of LoRA adapters is now possible using the hotswap_adapter function. Now you are able to load one LoRA and replace its weights in-place with the LoRA weights of another adapter which, in general, should be faster than deleting one adapter and loading the other adapter in its place. The feature is built so that no re-compilation of the model is necessary if torch.compile was called on the model (right now, this requires ranks and alphas to be the same for the adapters).

LoRA and IA³ now support Conv3d layers thanks to @jsilter, and @JINO-ROHIT added a notebook showcasing PEFT model evaluation using lm-eval-harness toolkit.

With the target_modules argument, you can specify which layers to target with the adapter (e.g. LoRA). Now you can also specify which modules not to target by using the exclude_modules parameter (thanks @JINO-ROHIT).

Changes

  • There have been made several fixes to the OFT implementation, among other things, to fix merging, which makes adapter weights trained with PEFT versions prior to this release incompatible (see [#1996] for details).
  • Adapter configs are now forward-compatible by accepting unknown keys.
  • Prefix tuning was fitted to the DynamicCache caching infrastructure of transformers (see [#2096]). If you are using this PEFT version and a recent version of transformers with an old prefix tuning checkpoint, you should double check that it still works correctly and retrain it if it doesn't.
  • Added lora_bias parameter to LoRA layers to enable bias on LoRA B matrix. This is useful when extracting LoRA weights from fully fine-tuned parameters with bias vectors so that these can be taken into account.
  • [#2180] provided a couple of bug fixes to LoKr (thanks @yaswanth19). If you're using LoKr, your old checkpoints should still work but it's recommended to retrain your adapter.
  • from_pretrained now warns the user if PEFT keys are missing.
  • Attribute access to modules in modules_to_save is now properly and transparently handled.
  • PEFT supports the changes to bitsandbytes 8bit quantization from the recent v0.45.0 release. To benefit from these improvements, we thus recommend to upgrade bitsandbytes if you're using QLoRA. Expect slight numerical differences in model outputs if you're using QLoRA with 8bit bitsandbytes quantization.

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/peft/compare/v0.13.2...v0.14.0

Source: README.md, updated 2024-12-06