Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Gemma 3n + Text-to-speech (TTS) source code.tar.gz | 2025-06-26 | 4.6 MB | |
Gemma 3n + Text-to-speech (TTS) source code.zip | 2025-06-26 | 4.8 MB | |
README.md | 2025-06-26 | 23.2 kB | |
Totals: 3 Items | 9.4 MB | 0 |
✨ Gemma 3n now available
- Google's new Gemma 3n multimodal models in 2B (E2B) and 4B (E4B) sizes
- Supports audio, vision, video and text inputs
- Available in safetensors, GGUF and dynamic 4bit BnB for finetuning.
- HuggingFace Collection Link: Gemma-3n
🎵 Text-to-Speech (TTS) Fine-tuning
- Train TTS/STT models like Sesame-CSM, Orpheus-TTS and OpenAI's Whisper locally!
- Clone voices, learn new emotions, tones & styles with 1.5x faster training and -50% VRAM
- TTS notebooks: https://docs.unsloth.ai/get-started/unsloth-notebooks#text-to-speech-tts-notebooks
[!TIP] Update Unsloth via
pip install --upgrade --force-reinstall unsloth unsloth_zoo
🧠 DeepSeek-R1-0528 Support with Dynamic 1-bit GGUFs
- Fine-tune DeepSeek-R1-0528-Qwen3 with GRPO! Our new reward function increases multilingual response rates by 40%+
- Dynamic 1-bit GGUFs shrink the full 715GB model to just 185GB (-75% size) with optimal accuracy
- DeepSeek-R1-0528-Qwen3 notebook: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/DeepSeek_R1_0528_Qwen3_(8B)_GRPO.ipynb
📈 Dynamic 2.0 GGUFs
- New quantization method outperforms leading quantization methods
- Sets new benchmarks for 5-shot MMLU and KL Divergence
- Selectively quantizes layers for optimal accuracy
- For more information: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs
⚡ Advanced Qwen3 GRPO notebook
- Proximity scoring for more nuanced reward functions
- OpenR1 dataset support with advanced templates
- Prefinetuning to skip GRPO format learning
-
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb
:::python
DeepSeek-R1 GRPO Fine-tuning Example: convert DeepSeek-R1-0528-Qwen3-8B into a reasoning model via GRPO by using OpenR1's Math dataset.
from unsloth import FastLanguageModel import torch max_seq_length = 1024 # Can increase for longer reasoning traces lora_rank = 32 # Larger rank = smarter, but slower
model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/DeepSeek-R1-0528-Qwen3-8B", max_seq_length = max_seq_length, load_in_4bit = True, # False for LoRA 16bit fast_inference = True, # Enable vLLM fast inference max_lora_rank = lora_rank, gpu_memory_utilization = 0.7, # Reduce if out of memory )
model = FastLanguageModel.get_peft_model( model, r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 target_modules = [ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", ], lora_alpha = lora_rank2, # 2 speeds up training use_gradient_checkpointing = "unsloth", # Reduces memory usage random_state = 3407, )
reasoning_start = None reasoning_end = None user_token = None assistant_token = None
for token in tokenizer.get_added_vocab().keys(): if "think" in token and "/" in token: reasoning_end = token elif "think" in token: reasoning_start = token elif "user" in token: user_token = token elif "assistant" in token: assistant_token = token
system_prompt = \ f"""You are given a problem. Think about the problem and provide your working out. You must think in Bahasa Indonesia."""
print(tokenizer.apply_chat_template([ {"role" : "user", "content" : "What is 1+1?"}, {"role" : "assistant", "content" : f"<think>I think it's 2.2</think>2"}, {"role" : "user", "content" : "What is 1+1?"}, {"role" : "assistant", "content" : f"<think>I think it's 2.2</think>2"}, ], tokenize = False, add_generation_prompt = True))
from datasets import load_dataset dataset = load_dataset("open-r1/DAPO-Math-17k-Processed", "en", split = "train")
def extract_hash_answer(text): # if "####" not in text: return None # return text.split("####")[1].strip() return text
dataset = dataset.map(lambda x: { "prompt" : [ {"role": "system", "content": system_prompt}, {"role": "user", "content": x["prompt"]}, ], "answer": extract_hash_answer(x["solution"]), })
Add optional EOS token matching
solution_end_regex = rf"{reasoning_end}(.*)"
match_format = re.compile(solution_end_regex, re.DOTALL) match_format
"""We verify it works:"""
match_format.findall( "Let me think!</think>"\ f"Hence, the solution is 2.", )
match_format.findall( "<think>Let me think!</think>"\ f"\n\nHence, the solution is 2", )
def match_format_exactly(completions, **kwargs): scores = [] for completion in completions: score = 0 response = completion[0]["content"] # Match if format is seen exactly! if match_format.search(response) is not None: score += 3.0 scores.append(score) return scores
"""If it fails, we want to reward the model if it at least follows the format partially, by counting each symbol:"""
def match_format_approximately(completions, **kwargs): scores = [] for completion in completions: score = 0 response = completion[0]["content"] # Count how many keywords are seen - we penalize if too many! # If we see 1, then plus some points!
# No need to reward <think> since we always prepend it! score += 0.5 if response.count(reasoning_start) == 1 else -1.0 score += 0.5 if response.count(reasoning_end) == 1 else -1.0 scores.append(score) return scores
"""We want to extract the generated answer, and reward or penalize it! We also reward it based on how close the answer is to the true one via ratios:"""
def check_answer(prompts, completions, answer, **kwargs): question = prompts[0][-1]["content"] responses = [completion[0]["content"] for completion in completions]
extracted_responses = [ guess.group(1) if (guess := match_format.search(r)) is not None else None \ for r in responses ] scores = [] for guess, true_answer in zip(extracted_responses, answer): score = 0 if guess is None: scores.append(-2.0) continue # Correct answer gets 5 points! if guess == true_answer: score += 5.0 # Match if spaces are seen, but less reward elif guess.strip() == true_answer.strip(): score += 3.5 else: # We also reward it if the answer is close via ratios! # Ie if the answer is within some range, reward it! try: ratio = float(guess) / float(true_answer) if ratio >= 0.9 and ratio <= 1.1: score += 2.0 elif ratio >= 0.8 and ratio <= 1.2: score += 1.5 else: score -= 2.5 # Penalize wrong answers except: score -= 4.5 # Penalize scores.append(score) return scores
match_numbers = re.compile( r".*?[\s]{0,}([-]?[\d.\,]{1,})", flags = re.MULTILINE | re.DOTALL ) print(match_numbers.findall(" 0.34 ")) print(match_numbers.findall(" 123,456 ")) print(match_numbers.findall(" -0.234 ")) print(match_numbers.findall("17"))
import langid
def get_lang(text: str) -> str: if not text: return "und" lang, _ = langid.classify(text) return lang
print(get_lang("Hello, How are you")) # This should return en print(get_lang("Aku berpikir kalau aku adalah kamu")) # This should return id print(get_lang("我在这里")) # This should return zh
import re
def format_and_language_reward_func(completions, **kwargs): scores = []
for completion_item in completions: if not completion_item or not isinstance(completion_item[0], dict) or "content" not in completion_item[0]: scores.append(-5.0) print(f"Warning: Malformed completion item, assigning default low score: {completion_item}") continue content = completion_item[0]["content"] lang = get_lang(content) if lang == 'id': score = 5.0 elif lang == 'en': score = -3.0 elif lang == 'zh': score = -3.0 else: score = -5.0 scores.append(score) return scores
prompts = [ [{"role": "assistant", "content": "What is the result of (1 + 2) * 4?"}], [{"role": "assistant", "content": "What is the result of (3 + 1) * 2?"}], ] completions = [ [{"role": "assistant", "content": "<think>The sum of 1 and 2 is 3, which we multiply by 4 to get 12.</think><answer>(1 + 2) * 4 = 12</answer>"}], [{"role": "assistant", "content": "The sum of 3 and 1 is 4, which we multiply by 2 to get 8. So (3 + 1) * 2 = 8."}], ] format_and_language_reward_func(prompts=prompts, completions=completions)
global PRINTED_TIMES PRINTED_TIMES = 0 global PRINT_EVERY_STEPS PRINT_EVERY_STEPS = 5
def check_numbers(prompts, completions, answer, **kwargs): question = prompts[0][-1]["content"] responses = [completion[0]["content"] for completion in completions]
extracted_responses = [ guess.group(1) if (guess := match_numbers.search(r)) is not None else None \ for r in responses ] scores = [] # Print only every few steps global PRINTED_TIMES global PRINT_EVERY_STEPS if PRINTED_TIMES % PRINT_EVERY_STEPS == 0: print( '*'*20 + f"Question:\n{question}", f"\nAnswer:\n{answer[0]}", f"\nResponse:\n{responses[0]}", f"\nExtracted:\n{extracted_responses[0]}" ) PRINTED_TIMES += 1 for guess, true_answer in zip(extracted_responses, answer): if guess is None: scores.append(-2.5) continue # Convert to numbers try: true_answer = float(true_answer.strip()) # Remove commas like in 123,456 guess = float(guess.strip().replace(",", "")) scores.append(3.5 if guess == true_answer else -1.5) except: scores.append(0) continue return scores
tokenized = dataset.map( lambda x: {"tokens" : tokenizer.apply_chat_template(x["prompt"], add_generation_prompt = True, tokenize = True)}, batched = True, ) print(tokenizer.decode(tokenized[0]["tokens"])) tokenized = tokenized.map(lambda x: {"L" : len(x["tokens"])})
import numpy as np maximum_length = int(np.quantile(tokenized["L"], 0.9)) print("Max Length = ", maximum_length)
Filter only samples smaller than 90% max length
dataset = dataset.select(np.where(np.array(tokenized["L"]) <= maximum_length)[0]) del tokenized
max_prompt_length = maximum_length + 1 # + 1 just in case! max_completion_length = max_seq_length - max_prompt_length
from vllm import SamplingParams vllm_sampling_params = SamplingParams( min_p = 0.1, top_p = 1.0, top_k = -1, seed = 3407, stop = [tokenizer.eos_token], include_stop_str_in_output = True, )
from trl import GRPOConfig, GRPOTrainer training_args = GRPOConfig( vllm_sampling_params = vllm_sampling_params, temperature = 1.0, learning_rate = 5e-6, weight_decay = 0.01, warmup_ratio = 0.1, lr_scheduler_type = "linear", optim = "adamw_8bit", logging_steps = 1, per_device_train_batch_size = 1, gradient_accumulation_steps = 1, # Increase to 4 for smoother training num_generations = 4, # Decrease if out of memory max_prompt_length = max_prompt_length, max_completion_length = max_completion_length, # num_train_epochs = 1, # Set to 1 for a full training run max_steps = 100, save_steps = 100, report_to = "none", # Can use Weights & Biases output_dir = "outputs",
# For optional training + evaluation # fp16_full_eval = True, # per_device_eval_batch_size = 4, # eval_accumulation_steps = 1, # eval_strategy = "steps", # eval_steps = 1,
)
trainer = GRPOTrainer( model = model, processing_class = tokenizer, reward_funcs = [ match_format_exactly, match_format_approximately, check_answer, check_numbers, format_and_language_reward_func, ], args = training_args, train_dataset = dataset,
# For optional training + evaluation # train_dataset = new_dataset["train"], # eval_dataset = new_dataset["test"],
) trainer.train()
🎯 Magistral Conversational Reasoning - Fine-tune Magistral-24B for advanced conversational reasoning - Magistral notebook: https://github.com/unslothai/notebooks/blob/main/nb/Magistral_(24B)-Reasoning-Conversational.ipynb
👁️ Gemma3 Vision Support - Fine-tune Gemma3 vision models for multimodal tasks - Gemma3 Vision notebook: https://github.com/unslothai/notebooks/blob/main/nb/Gemma3_(4B)-Vision.ipynb
Documentation & Guides
- Reinforcement Learning Guide: Complete guide on RL for LLMs covering GRPO, RLHF, DPO. Check it out here: https://docs.unsloth.ai/basics/reinforcement-learning-guide
- LoRA Hyperparameters Guide: Master optimal learning rates, epochs, LoRA rank & alpha settings, Check it out here: https://docs.unsloth.ai/get-started/fine-tuning-guide/lora-hyperparameters-guide
What's Changed
- Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/2448
- Added k_norm & q_norm to merged Qwen3 layers by @cblomert in https://github.com/unslothai/unsloth/pull/2452
- MoE Kernel by @jeromeku in https://github.com/unslothai/unsloth/pull/2465
- Blackwell Support by @johnnynunez in https://github.com/unslothai/unsloth/pull/2458
- Added missing code of conduct by @rolandtannous in https://github.com/unslothai/unsloth/pull/2416
- Fix readme example by @yuanzhedong in https://github.com/unslothai/unsloth/pull/2492
- the pixtral vision notebook fails during inference by @mmathew23 in https://github.com/unslothai/unsloth/pull/2466
- [1/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2350
- [2/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2388
- vLLM Windows CUDA support [tested] by @fenglui in https://github.com/unslothai/unsloth/pull/2158
- Add Sesame CSM by @mmathew23 in https://github.com/unslothai/unsloth/pull/2527
- Add Qwen-3 chat template and Ollama template support by @kiankyars in https://github.com/unslothai/unsloth/pull/2537
- Fix typos by @omahs in https://github.com/unslothai/unsloth/pull/2540
- Add use_rslora reference to LoraConfig inititalisation by @jkumz in https://github.com/unslothai/unsloth/pull/2539
- TTS by @danielhanchen in https://github.com/unslothai/unsloth/pull/2545
- Quick fix on the CompileConfig error by @Erland366 in https://github.com/unslothai/unsloth/pull/2554
- Fix trust remote code by @Etherll in https://github.com/unslothai/unsloth/pull/2357
- fix issue with qwen3 template double quote escapes by @davedgd in https://github.com/unslothai/unsloth/pull/2563
- Display the model name in RoPE scaling unsupported error by @emmanuel-ferdman in https://github.com/unslothai/unsloth/pull/2564
- Fix Whisper, ModernBERT by @danielhanchen in https://github.com/unslothai/unsloth/pull/2565
- fix: improved error handling when llama.cpp build fails [#2358] by @Hansehart in https://github.com/unslothai/unsloth/pull/2603
- Remove
dataset_text_field
fromSFTConfig
by @qgallouedec in https://github.com/unslothai/unsloth/pull/2609 - Upgrade trl fix by @Datta0 in https://github.com/unslothai/unsloth/pull/2544
- Check the
skip_prepare_dataset
before accessing dataset fields. [#2496] by @Premik in https://github.com/unslothai/unsloth/pull/2633 - Llama4 MoE Grouped GEMM by @jeromeku in https://github.com/unslothai/unsloth/pull/2639
- Latest TRL, GRPO + Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2645
- Fix SFTtraining for new trl by @mmathew23 in https://github.com/unslothai/unsloth/pull/2647
- Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2651
- Fix quant model param fetch regex by @Datta0 in https://github.com/unslothai/unsloth/pull/2662
- Fix batched generation for prompts of different lengths by @RunFMe in https://github.com/unslothai/unsloth/pull/2216
- reroute merge logic language models + comprehensive tests + eval kits by @rolandtannous in https://github.com/unslothai/unsloth/pull/2673
- unsloth checkpointing fix for latest transformers==4.52.x by @mmathew23 in https://github.com/unslothai/unsloth/pull/2674
- patch sft_trainer to favor max_seq_length over max_length in config by @mmathew23 in https://github.com/unslothai/unsloth/pull/2669
- Update prepare 4d causal attention call by @mmathew23 in https://github.com/unslothai/unsloth/pull/2678
- Ignore None Values when building vllm subprocess_command by @Salpingopharyngeus in https://github.com/unslothai/unsloth/pull/2680
- add support for torch270 with Intel GPU by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2709
- Making protobuf version more flexible by @user799595 in https://github.com/unslothai/unsloth/pull/2637
- tests for additional merge fix unsloth zoo pr 163 by @rolandtannous in https://github.com/unslothai/unsloth/pull/2719
- Reward modeling update (There seems to be another patch) by @pluesclues in https://github.com/unslothai/unsloth/pull/2710
- Fix Typos in Documentation and Comments by @leopardracer in https://github.com/unslothai/unsloth/pull/2721
- Fix renaming on other model than Llama by @Erland366 in https://github.com/unslothai/unsloth/pull/2762
- Enable vLLM to share memory space by @Datta0 in https://github.com/unslothai/unsloth/pull/2712
- Fix TRL 1.8.2 by @marcandrelarochelle in https://github.com/unslothai/unsloth/pull/2774
- Fix AttributeError in GRPO trainer for models without llm attribute by @rolandtannous in https://github.com/unslothai/unsloth/pull/2780
- Additional tests for unsloth-zoo PR#174 by @rolandtannous in https://github.com/unslothai/unsloth/pull/2779
- Update pyproject.toml by @amrothemich in https://github.com/unslothai/unsloth/pull/2778
- Fix for grpo_compute_loss_slow by @simpissa in https://github.com/unslothai/unsloth/pull/2702
- Fix GRPO by @danielhanchen in https://github.com/unslothai/unsloth/pull/2787
- Docs: Fix typo and improve MoE docstrings by @kilavvy in https://github.com/unslothai/unsloth/pull/2784
- [5/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2768
- Sequence Classification Bug Fixes by @pluesclues in https://github.com/unslothai/unsloth/pull/2793
- intel 5/N fix patch by @mmathew23 in https://github.com/unslothai/unsloth/pull/2792
- [3/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2620
- [4/N] Enable intel GPU for unsloth by @mmathew23 in https://github.com/unslothai/unsloth/pull/2801
- [intel] use DeviceProperties instead of torch.xxx.deviceproperties by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2803
- Fix grpo sleep regex and indentation by @Datta0 in https://github.com/unslothai/unsloth/pull/2804
- Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2805
- Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2807
New Contributors
- @cblomert made their first contribution in https://github.com/unslothai/unsloth/pull/2452
- @johnnynunez made their first contribution in https://github.com/unslothai/unsloth/pull/2458
- @rolandtannous made their first contribution in https://github.com/unslothai/unsloth/pull/2416
- @yuanzhedong made their first contribution in https://github.com/unslothai/unsloth/pull/2492
- @mmathew23 made their first contribution in https://github.com/unslothai/unsloth/pull/2466
- @leizhenyuan made their first contribution in https://github.com/unslothai/unsloth/pull/2350
- @fenglui made their first contribution in https://github.com/unslothai/unsloth/pull/2158
- @kiankyars made their first contribution in https://github.com/unslothai/unsloth/pull/2537
- @omahs made their first contribution in https://github.com/unslothai/unsloth/pull/2540
- @jkumz made their first contribution in https://github.com/unslothai/unsloth/pull/2539
- @davedgd made their first contribution in https://github.com/unslothai/unsloth/pull/2563
- @emmanuel-ferdman made their first contribution in https://github.com/unslothai/unsloth/pull/2564
- @qgallouedec made their first contribution in https://github.com/unslothai/unsloth/pull/2609
- @Premik made their first contribution in https://github.com/unslothai/unsloth/pull/2633
- @RunFMe made their first contribution in https://github.com/unslothai/unsloth/pull/2216
- @Salpingopharyngeus made their first contribution in https://github.com/unslothai/unsloth/pull/2680
- @user799595 made their first contribution in https://github.com/unslothai/unsloth/pull/2637
- @pluesclues made their first contribution in https://github.com/unslothai/unsloth/pull/2710
- @leopardracer made their first contribution in https://github.com/unslothai/unsloth/pull/2721
- @marcandrelarochelle made their first contribution in https://github.com/unslothai/unsloth/pull/2774
- @amrothemich made their first contribution in https://github.com/unslothai/unsloth/pull/2778
- @simpissa made their first contribution in https://github.com/unslothai/unsloth/pull/2702
- @kilavvy made their first contribution in https://github.com/unslothai/unsloth/pull/2784
Full Changelog: https://github.com/unslothai/unsloth/compare/May-2025...June-2025