Download Latest Version v1.9.3 source code.tar.gz (1.2 MB)
Email in envelope

Get an email when there's a new version of Text Embeddings Inference

Home / v1.9.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2026-02-17 6.4 kB
v1.9.0 source code.tar.gz 2026-02-17 1.2 MB
v1.9.0 source code.zip 2026-02-17 1.4 MB
Totals: 3 Items   2.6 MB 0

text-embeddings-inference-v1 9 0

What's changed?

🚨 Breaking changes

Default GeLU implementation is now GeLU + tanh approximation instead of exact GeLU (aka. GeLU erf) to make sure that the CPU and CUDA embeddings are the same (as cuBLASlt only supports GeLU + tanh), which represents a slight misalignment from how Transformers handles it, as when hidden_act="gelu" is set in config.json, GeLU erf should be used. The numerical differences between GeLU + tanh and GeLU erf should have negligible impact on inference quality.

--auto-truncate now defaults to true, meaning that the sequences will be truncated to the lower value between the --max-batch-tokens or the maximum model length, to prevent the --max-batch-tokens from being lower than the actual maximum supported length.

🎉 Additions

🐛 Fixes

⚡ Improvements

📄 Other

🆕 New Contributors

Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.8.3...v1.9.0

Source: README.md, updated 2026-02-17