Tree [c3d7c6] master /
History



File Date Author Commit
html 2012-10-05 rsz rsz [c3d7c6] Updated benchmarks after local software upgrade...
Doxyfile 2012-08-01 rsz rsz [a4cb15] Fixed doxygen static function support.
LICENSE.MIT 2012-08-01 rsz rsz [aab9dc] Updated readme and webpage.
Makefile 2012-10-05 rsz rsz [c3d7c6] Updated benchmarks after local software upgrade...
README.md 2012-10-05 rsz rsz [c3d7c6] Updated benchmarks after local software upgrade...
bench.java 2012-05-10 rsz rsz [585a51] Added boost and java benchmarks (both slow BTW).
benchUtil.hpp 2012-05-11 rsz rsz [18b3cb] Added c++ stream write test and write benchmarks.
bench_boost.cpp 2012-05-10 rsz rsz [585a51] Added boost and java benchmarks (both slow BTW).
bench_ezStringUtil.cpp 2012-05-11 rsz rsz [18b3cb] Added c++ stream write test and write benchmarks.
bench_ezStringUtil.py 2012-05-07 rsz rsz [954bdf] Add fsplit with stream,fgets,fread,strtok and b...
bench_strtk.cpp 2012-05-10 rsz rsz [585a51] Added boost and java benchmarks (both slow BTW).
ezStringUtil.hpp 2012-08-01 rsz rsz [a4cb15] Fixed doxygen static function support.
test_ezStringUtil.cpp 2012-05-10 rsz rsz [585a51] Added boost and java benchmarks (both slow BTW).

Read Me

Overview

A collection of functions that are frequently used in string handling
programs. STL is used when possible. Locale and internationalization
are not considered. This is not intended to replace or be like the
excellent "strtk" library. Supports stripping whitespace, delimited
splitting, set operations, delimited field searches, case handling,
comparisons and delimited file parsing.

My mock use-cases (benchmarks) show that dsv file splitting is faster with this
library than with StrTk, Boost, Python and Java.

Download

Source code

Git

Installation

make test
make memtest
make bench
make bench_python
make bench_pypy
make bench_strtk
make bench_boost
make bench_java
make clean
sudo make install
make dist VER=X.Y.Z

Performance

# Environment:
ubuntu 12.04
g++ 4.6.3
Python 2.7.3
OpenJDK 1.6.0_24 64bit 6b24-1.11.1-4ubuntu2, 20.0-b12, mixed
Boost 1.46
i7 x990, 8 x 3.47GHz, 24GB RAM, 256 GB SSD

# Getting 20th column in a 1GB DSV:
1) c++ fsplit_stream with sync=false, vector of vectors (1.52s)
2) c++ fsplit_stream with sync=false, deque of vectors (1.74s)
3) c++ fsplit_stream with sync=true (2.0s)
4) c++ fsplit_fgets (2.0s)
5) c++ strtk, 10th column limit (20th N/A so could be worse) (2.0s)
6) pypy str.index (3.4s)
7) c++ fsplit_fread (3.9s)
8) c++ fsplit_strtok (3.9s)
9) pypy str.split (5.5s)
10) python str.split (8.8s)
11) python csv module (11.7s)
12) c++ boost split (19s)
13) python str.index (24s)
14) pypy csv module (37s)
15) c++ boost tokenize (39s)
16) java scanner (95s)

# Writing 1GB DSV:
1) c++ fprintf (11s)
2) c++ stream (19s)
3) pypy (19s)
4) python (70s)
5) java (113s)

Distribution

make html
make clean
make dist VER=0.5.4

Publishing

ssh -t rsz,ezstringutil@shell.sourceforge.net create
scp html/* rsz,ezstringutil@shell.sourceforge.net:/home/project-web/ezstringutil/htdocs
scp ../ezstringutil-0.5.4.tar.gz rsz,ezstringutil@shell.sourceforge.net:/home/frs/project/e/ez/ezstringutil

Changelog

v0.5.4 2012-8-1

  • Updated readme and webpage.

License

Copyright (C) 2011,2012 Remik Ziemlinski (see LICENSE.MIT)

<link rel="stylesheet" href="http://yandex.st/highlightjs/7.0/styles/default.min.css">
<script src="http://yandex.st/highlightjs/7.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>

<style type="text/css"> body { font-family: Sans-Serif; } </style>