Download Latest Version stringzilla_bare_windows_x64_4.4.0.tar (128.0 kB)
Email in envelope

Get an email when there's a new version of StringZilla

Home / v4.4.0
Name Modified Size InfoDownloads / Week
Parent folder
stringzilla_bare_windows_x64_4.4.0.tar 2025-11-29 128.0 kB
stringzilla_bare_linux_arm64_4.4.0.deb 2025-11-29 86.8 kB
stringzilla_bare_linux_arm64_4.4.0.so 2025-11-29 177.1 kB
stringzilla_bare_linux_amd64_4.4.0.deb 2025-11-29 112.6 kB
stringzilla_bare_linux_amd64_4.4.0.so 2025-11-29 321.2 kB
stringzilla_shared_macos_arm64_4.4.0.zip 2025-11-29 94.1 kB
README.md 2025-11-29 2.9 kB
v4.4_ Case-Folding UTF-8 in AVX-512 source code.tar.gz 2025-11-29 729.1 kB
v4.4_ Case-Folding UTF-8 in AVX-512 source code.zip 2025-11-29 787.8 kB
Totals: 9 Items   2.4 MB 0

To my knowledge, this is the first ever properly vectorized case-folding (aka .to_lower()) implementation compliant with Unicode (v17) and using SIMD (AVX-512 for Intel Ice Lake and newer). The results are remarkable across most languages, but it wasn't trivial to achieve. Unlike dense linear algebra workloads, such as in SimSIMD, no shared logic holds across all languages and code points here. After all, Unicode began in 1989 and covers languages and writing systems that took thousands of years to develop and decades to be organized into a standardized set of rules.

This implementation focuses on locale-independent conversion. It covers every one of 1000+ character folding rules in CaseFolding.txt of the Unicode spec, including:

  • simple cases, like ASCII English letters: 'A' → 'a'.
  • complex Latin extensions, where one codepoint expands into multiple characters: 'ẞ' → "ss".
  • ligatures and mathematical symbols, like 'ffi' → "ffi".
  • less common bicameral alphabets, including Armenian, Georgian, Vietnamese, and others.
  • fast memcpy-like paths for unicameral scripts, like Chinese, Japanese, and Korean.

To benchmark all of those, I've extended the StringWars benchmarks with a new bench_unicode.rs and bench_unicode.py scripts and the bench_unicode.md report produced for two dozen datasets pulled from the Leipzig Wikipedia corpora. On most languages the performance is great, except for Georgian and Vietnamese for now:

Language Serial Ice Lake AVX-512 Speedup
English 550.93 MiB/s 6.87 GiB/s 12.8x
German 482.14 MiB/s 2.54 GiB/s 5.4x
Russian 518.44 MiB/s 2.14 GiB/s 4.2x
Greek 255.05 MiB/s 960.11 MiB/s 3.8x
Chinese 526.30 MiB/s 1.00 GiB/s 1.95x
Vietnamese 346.69 MiB/s 353.04 MiB/s 1.02x
Georgian 519.07 MiB/s 517.61 MiB/s 0.997x

For a complete comparison, go to StringWars 😉

Minor

  • Add: Fast path for Georgian case-folding (fa7422c)
  • Add: Case-insensitive ops for Python (d88e30a)
  • Add: Dispatch case-insensitive search (4ae91c0)
  • Add: Serial case-insensitive find & compare (4b18f05)

Patch

  • Fix: Eszett hex parsing warnings in Clang (8b27080)
  • Fix: Avoid __builtin missing on MSVC (fdc95f3)
  • Fix: Uninitialized values warning (b84c83e)
  • Improve: Safer & faster case-folding on Ice Lake (bcd5d16)
  • Improve: Case-folding on Ice Lake (bb23b60)
  • Fix: Move Ice Lake kernels out of Haswell scope (b7cc2c4)
  • Improve: Rename functions towards utf8_case* (44fbb92)
  • Improve: Faster serial Unicode folding (aa1b21b)
  • Improve: Re-group folding by char-length (c3586e2)
  • Docs: Avoid locale-specific Unicode rules (333a778)
  • Docs: Emoji-free doc section titles (#284) (dc11b40)
Source: README.md, updated 2025-11-29