Download Latest Version maxtext-v0.2.2 source code.tar.gz (39.0 MB)
Email in envelope

Get an email when there's a new version of MaxText

Home / maxtext-v0.2.2
Name Modified Size InfoDownloads / Week
Parent folder
maxtext-v0.2.2 source code.tar.gz 2026-05-08 39.0 MB
maxtext-v0.2.2 source code.zip 2026-05-08 40.0 MB
README.md 2026-05-08 3.7 kB
Totals: 3 Items   79.0 MB 1

Changes

  • Upgraded JAX to version 0.9.2, improving support for both pre-training and post-training.
  • Introduced simplified APIs for accessing MaxText models.
  • Included maxtext_with_gepa.ipynb (github.com), a new notebook demonstrating AIME prompt optimization using the GEPA framework within MaxText.
  • Added support for Kimi-K2 models and the MuonClip optimizer. Users can explore this with the kimi-k2-1t config (see user guide for details).
  • Kimi-K2-Thinking, Kimi-K2.5 (text), and Kimi-K2.6 (text) are now supported. See Run_Kimi.md (github.com) for details.
  • DeepSeek-V3.2 is now supported, including DeepSeek Sparse Attention for handling long contexts. Use the deepseek3.2-671b config to try it out (refer to the user guide for more information).
  • Support has been added for Gemma 4 multi-modal models (26B MoE and 31B dense). These can be used with the gemma4-26b and gemma4-31b configs. See Run_Gemma4.md (github.com) for further details.
  • Support has been added for Gemma 4 inference using MaxText on vLLM plugin.
  • Enhanced RL capabilities with support for the open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning.
  • Added more evaluation modes for RL like majority voting and pass@1 estimation.
  • Sync weights to vllm prior to pre RL evaluation.
  • More robust usage of math-verify in RL.
  • MaxText's Supervised Fine-Tuning (SFT) now supports non-instruct models.
  • Added support for tensor parallelism using the Fused MoE kernel for MaxText on vLLM inference.
  • Added support for MaxText to vllm converters for Qwen3 and Gemma4 family of models.
  • validate_converter.py (github.com) now runs on multislice environment to test larger models with utilities to compare maxtext and vllm weights.

Deprecations

  • Legacy MaxText.* shims have been removed. Please refer to src/MaxText/README.md (github.com) for details on the new command locations and how to migrate.
  • Sequence parallelism has been deprecated, please use context parallelism instead.
  • The flag expert_shard_attention_option is deprecated, use custom_mesh_and_rule=ep-as-cp for the same functionality.
Source: README.md, updated 2026-05-08