Transformer Reinforcement Learning X - Browse /v0.7.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2023-06-23	5.7 kB	0
v0.7.0_ NeMo PPO, PEFT Migration, and Fixes source code.tar.gz	2023-06-23	295.6 kB	0
v0.7.0_ NeMo PPO, PEFT Migration, and Fixes source code.zip	2023-06-23	370.5 kB	0
Totals: 3 Items		671.8 kB	0

The v0.7.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

🐠 NeMo PPO and SFT support

This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.

NeMo PPO by @cat-state in https://github.com/CarperAI/trlx/pull/472
Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in https://github.com/CarperAI/trlx/pull/353

🦆 PEFT Migration

trlx now supports parameter-efficient tuning methods via the peft library, which we hope will provide greater access to RLHF training in low-resource settings.

peft to opendelta migration (#434) + memory optimization (#320) by @glerzing in https://github.com/CarperAI/trlx/pull/486

Fixes and mores!

Set pad_token for all tokenizers in tests by @cat-state in https://github.com/CarperAI/trlx/pull/414
Convert tensors in the stats dict into scalars by @ZHAOTING in https://github.com/CarperAI/trlx/pull/417
Add Translation Finetuning Example with T5 by @alexandremuzio in https://github.com/CarperAI/trlx/pull/392
set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in https://github.com/CarperAI/trlx/pull/409
[fix] add position_ids to LlamaModelBranch by @jon-tow in https://github.com/CarperAI/trlx/pull/418
fix(CI): use pinned deps for CI testing by @jon-tow in https://github.com/CarperAI/trlx/pull/423
Minibatch impl by @Dahoas in https://github.com/CarperAI/trlx/pull/364
[feat] Support tying metadata to each prompt by @maxreciprocate in https://github.com/CarperAI/trlx/pull/421
feat(examples): revamp simulacra example by @maxreciprocate in https://github.com/CarperAI/trlx/pull/430
[fix] update pairwise dataloader. by @Chen9154 in https://github.com/CarperAI/trlx/pull/395
fix(sft_trainer): total_steps calculation when running distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/432
fix(base_trainer): gather weights in save_pretrained under zero3 by @maxreciprocate in https://github.com/CarperAI/trlx/pull/429
fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/435
fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in https://github.com/CarperAI/trlx/pull/441
Create Example training scripts to run in Stability cluster by @alexandremuzio in https://github.com/CarperAI/trlx/pull/419
Upgrade official released Ray instead of an unstable one. by @jovany-wang in https://github.com/CarperAI/trlx/pull/455
Pin transformers<=4.27.1 by @jovany-wang in https://github.com/CarperAI/trlx/pull/458
fix(ppo_gpt): prevent position_ids being None by @li-plus in https://github.com/CarperAI/trlx/pull/451
fix(trainer): init self.generate_sweep_kwarg at self.init by @mymusise in https://github.com/CarperAI/trlx/pull/460
Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in https://github.com/CarperAI/trlx/pull/420
Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in https://github.com/CarperAI/trlx/pull/422
docs(base_trainer): fill in missing prepare_learning method by @maxreciprocate in https://github.com/CarperAI/trlx/pull/449
fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/450
fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in https://github.com/CarperAI/trlx/pull/444
feat(requirements.txt): upgrade dependencies by @maxreciprocate in https://github.com/CarperAI/trlx/pull/465
fix(offline_pipeline): force drop_last only for distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/475
hotfix(bnb): install scipy with bitsanbytes to avoid ModuleNotFoundError by @jon-tow in https://github.com/CarperAI/trlx/pull/492
fix type hint in PromptPipeline.init by @g-simmons in https://github.com/CarperAI/trlx/pull/496
fix(modeling_ilql): single q-head indexing by @maxreciprocate in https://github.com/CarperAI/trlx/pull/471
Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in https://github.com/CarperAI/trlx/pull/506
Fix PPO log_ratio bug by @TobiasNorlund in https://github.com/CarperAI/trlx/pull/509
fix(ppo_trainer): default gen kwargs by @maxreciprocate in https://github.com/CarperAI/trlx/pull/510

New Contributors

@ZHAOTING made their first contribution in https://github.com/CarperAI/trlx/pull/417
@cauyxy made their first contribution in https://github.com/CarperAI/trlx/pull/409
@Chen9154 made their first contribution in https://github.com/CarperAI/trlx/pull/395
@jovany-wang made their first contribution in https://github.com/CarperAI/trlx/pull/455
@li-plus made their first contribution in https://github.com/CarperAI/trlx/pull/451
@mymusise made their first contribution in https://github.com/CarperAI/trlx/pull/460
@mikljohansson made their first contribution in https://github.com/CarperAI/trlx/pull/420
@g-simmons made their first contribution in https://github.com/CarperAI/trlx/pull/496
@iwiwi made their first contribution in https://github.com/CarperAI/trlx/pull/506
@TobiasNorlund made their first contribution in https://github.com/CarperAI/trlx/pull/509
@glerzing made their first contribution in https://github.com/CarperAI/trlx/pull/486

Full Changelog: https://github.com/CarperAI/trlx/compare/v0.6.0...v0.7.0

Source: README.md, updated 2023-06-23

Transformer Reinforcement Learning X Files

A repo for distributed training of language models with Reinforcement

🐠 NeMo PPO and SFT support

🦆 PEFT Migration

Fixes and mores!

New Contributors

Transformer Reinforcement Learning X Files

A repo for distributed training of language models with Reinforcement

Get an email when there's a new version of Transformer Reinforcement Learning X

🐠 NeMo PPO and SFT support

🦆 PEFT Migration

Fixes and mores!

New Contributors