Download Latest Version Tunix v0.1.6 -- Agentic RL _ VLM source code.tar.gz (28.7 MB)
Email in envelope

Get an email when there's a new version of Tunix

Home / v0.1.4
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-11-20 16.6 kB
Tunix v0.1.4 -- JAX 0.8.1 Flax 0.12.1 source code.tar.gz 2025-11-20 12.6 MB
Tunix v0.1.4 -- JAX 0.8.1 Flax 0.12.1 source code.zip 2025-11-20 12.8 MB
Totals: 3 Items   25.4 MB 0

Highlights

API Changes

# Old:
cluster_config = rl_cluster_lib.ClusterConfig(
    role_to_mesh={
        ...,
    },
    training_config=rl_cluster_lib.RLTrainingConfig(
        ...,
    ),
    rollout_engine=args.rollout_engine,
    rollout_config=base_rollout.RolloutConfig(
        ...,
    ),
    rollout_vllm_model_version=VLLM_MODEL_VERSION,
    ...,
)
# New:
cluster_config = rl_cluster_lib.ClusterConfig(
    role_to_mesh={
        ...,
    },
    training_config=rl_cluster_lib.RLTrainingConfig(
        ...,
    ),
    rollout_engine=args.rollout_engine,
    rollout_config=base_rollout.RolloutConfig(
        ...,
        rollout_vllm_model_version=VLLM_MODEL_VERSION,
        ...,
    ),
)

New Features

Model Support:

  • Added configuration for the Qwen2.5 math-1.5b model.
  • Included mobile fine-tuning examples for Gemma 270M.

SGLang Integration:

  • Introduced an SGLang JAX sampler.
  • Added SGLang JAX mapping for Qwen2 models.
  • Enabled SGLang/JAX CI.

Agentic Workflows:

  • Added ModelAgent and TaskEnvironment for single-turn agentic workflows.
  • Introduced an Agentic GRPOLearner for RL training.
  • Provided a script for GRPO agent mode.
  • Added tests for agentic_grpo_learner.
  • Implemented Agentic GRPO with multi-iteration support and fixes.

Training & Evaluation:

  • Added support for ORPO trainer.
  • Included scripts for OSS math500 evaluation and deepscalar.

Infrastructure:

  • Added Dockerfile and build scripts for Tunix for GKE development.
  • Implemented GitHub Actions workflows for Tunix TPU nightly regression.
  • Added a plugin-type custom logging backend support in MetricsLogger.

Improvements

Model Loading & Configuration:

  • Refactored model loading from Flax Orbax checkpoints, including fixes for Gemma and Gemma2.
  • Refactored gemma modelConfig to explicitly include all models.
  • Relaxed frozen configuration for models.

Performance & Efficiency:

  • Improved speed of safetensor loading.
  • Added per-Python-thread timeline and export of perf metrics to metrics_logger.
  • Rewrote the performance tracer with a new data model.
  • Enabled vLLM Data Parallelism on Tunix.

Architecture & Refactoring:

  • Moved agentic code out of the experimental folder.
  • Moved rollout related configs from cluster config to rollout_config.
  • Updated trajectory engine code.
  • Updated RolloutOrchestrator logic.
  • Implemented a concrete naming structure for parsing HuggingFace model IDs.
  • Updated model module to prevent AttributeError with pytree=false.

Usability:

  • Updated vanilla sampler to accept single strings.
  • Made put_exception in GRPO agentic learner asynchronous.
  • Enabled micro_batch_size for rollout and reference models in the PPO learner.
  • Added support for user-defined rollout engines.
  • Added Kaggle and GitHub buttons to Tunix example notebooks.
  • Improved HBM usage reporting in multi-process SPMD.

Internal:

  • Refactored TPU tests to run separately based on HF_TOKEN requirements.
  • Updated Tunix GitHub Actions to trigger on push to main.
  • Moved Docker files to the root directory.
  • Added backward compatibility for set_mesh.

Bug Fixes

  • Fixed broken CI due to vLLM.
  • Fixed vLLM driver tests.
  • Improved test collection to only include target tests.
  • Fixed a conditional issue in the Tunix Gemma implementation.
  • Fixed nnx.remat usage with bound methods.
  • Fixed the OSS GRPO training script.
  • Fixed Qwen2 mapping for SGLang/JAX.
  • Fixed an incorrect loss type issue.
  • Fixed max_step initialization when profiling.
  • Fixed issues with multiple metrics loggers.
  • Reduced test flakiness.
  • Fixed broken links in README.md.
  • Corrected algo_config naming in GRPOLearner.
  • Fixed the get_logprobs_from_vllm_output utility function.
  • Fixed TypeError in example notebooks by updating mesh indexing (MESH[0] to len(MESH[0])).
  • Addressed a very weird bug. (Details pending)

Documentation

  • Fixed documentation build for ReadTheDocs.
  • Minor fix on grpo_demo description.
  • Added README for SGLang JAX.
  • Updated docstring usage for dataclasses.

Internal & Tooling

  • Automated GitHub issue assignment to all engineers.
  • Converted notebook files (.ipynb) to Python scripts (_nb.py) and removed Jupyter cell markers.
  • Updated debug logging.
  • Pinned Qwix version to 0.1.1 (and later removed the pin).
  • Ensured latest dependencies are installed by forcing reinstall.
  • Temporarily disabled SGLang tests.
  • Removed gcsfs from pyproject.toml dependencies.

Detailed PRs

New Contributors

Full Changelog: https://github.com/google/tunix/compare/v0.1.3...v0.1.4

Source: README.md, updated 2025-11-20