Download Latest Version CTranslate2 4.6.0 source code.tar.gz (3.4 MB)
Email in envelope

Get an email when there's a new version of CTranslate2

Home / v4.4.0
Name Modified Size InfoDownloads / Week
Parent folder
CTranslate2 4.4.0 source code.tar.gz 2024-09-09 3.4 MB
CTranslate2 4.4.0 source code.zip 2024-09-09 3.6 MB
README.md 2024-09-09 684 Bytes
Totals: 3 Items   7.0 MB 1

Removed: Flash Attention support in the Python package due to significant package size increase with minimal performance gain.
Note: Flash Attention remains supported in the C++ package with the WITH_FLASH_ATTN option.
Flash Attention may be re-added in the future if substantial improvements are made.

New features

  • Support Llama3 (#1751)
  • Support Gemma2 (#1772)
  • Add log probs for all tokens in vocab (#1755)
  • Grouped conv1d (#1749 + [#1758])

Fixes and improvements

  • Fix pipeline (#1723 + [#1747])
  • Some improvements in flash attention (#1732)
  • Fix crash when using return_alternative on CUDA (#1733)
  • Quantization AWQ GEMM + GEMV (#1727)
Source: README.md, updated 2024-09-09