CTranslate2 - Browse /v4.4.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
CTranslate2 4.4.0 source code.tar.gz	2024-09-09	3.4 MB	0
CTranslate2 4.4.0 source code.zip	2024-09-09	3.6 MB	1
README.md	2024-09-09	684 Bytes	0
Totals: 3 Items		7.0 MB	1

Removed: Flash Attention support in the Python package due to significant package size increase with minimal performance gain.
Note: Flash Attention remains supported in the C++ package with the WITH_FLASH_ATTN option.
Flash Attention may be re-added in the future if substantial improvements are made.

New features

Support Llama3 (#1751)
Support Gemma2 (#1772)
Add log probs for all tokens in vocab (#1755)
Grouped conv1d (#1749 + [#1758])

Fixes and improvements

Fix pipeline (#1723 + [#1747])
Some improvements in flash attention (#1732)
Fix crash when using return_alternative on CUDA (#1733)
Quantization AWQ GEMM + GEMV (#1727)

Source: README.md, updated 2024-09-09

CTranslate2 Files

Fast inference engine for Transformer models

New features

Fixes and improvements

CTranslate2 Files

Fast inference engine for Transformer models

Get an email when there's a new version of CTranslate2

New features

Fixes and improvements