StringZilla is the Godzilla of string libraries, splitting, sorting, and shuffling large textual datasets. StringZilla uses a heuristic so simple it's almost stupid... but it works. It matches the first few letters of words with hyper-scalar code to achieve memcpy speeds. The implementation fits into a single C 99 header file and uses different SIMD flavors and SWAR on older platforms. The Str is designed to replace long Python str strings and wrap our C-level API. On the other hand, the File memory-maps a file from persistent memory without loading its copy into RAM. The contents of that file would remain immutable, and the mapping can be shared by multiple Python processes simultaneously. A standard dataset pre-processing use case would be to map a sizeable textual dataset like Common Crawl into memory, spawn child processes, and split the job between them.

Features

  • Collection-Level Operations
  • Low-Level Python API
  • String libraries, splitting, sorting, and shuffling large textual dataset
  • JavaScript docs
  • Python docs
  • Substring Search

Project Samples

Project Activity

See All Activity >

Categories

JSON

License

Apache License V2.0

Follow StringZilla

StringZilla Web Site

nel_h2
Simply solve complex auth. Easy for devs to set up. Easy for non-devs to use. Icon
Simply solve complex auth. Easy for devs to set up. Easy for non-devs to use.

Transform user access with Frontegg CIAM: login box, SSO, MFA, multi-tenancy, and 99.99% uptime.

Custom auth drains 25% of dev time and risks 62% more breaches, stalling enterprise deals. Frontegg platform delivers a simple login box, seamless authentication (SSO, MFA, passwordless), robust multi-tenancy, and a customizable Admin Portal. Integrate fast with the React SDK, meet compliance needs, and focus on innovation.
Start for Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of StringZilla!

Additional Project Details

Programming Language

C++

Related Categories

C++ JSON Software

Registered

2023-10-18