From: Jeff H. <je...@ac...> - 2012-12-28 00:18:44
|
On 2012-11-23, at 7:28 PM, Larry McVoy <lm...@bi...> wrote: > On Fri, Nov 23, 2012 at 06:45:18PM -0800, Larry McVoy wrote: >>>> If that is the direction then I'd beg for two string types, unicode and >>>> not. Tcl could be kick ass fast if unicode was optional. >>> >>> You keep harping on that, and it's just not quite true. >>> Tcl is slower than it needs to be, because of all the conversion >>> back and forth between UTF-8 (CESU-8, actually) and UCS-2 that it >>> does. That has to be revisited sometime fairly soon anyway, if we're >>> to break the BMP barrier. If we can get [string index] (and >>> [string range] and friends), and [regexp] working on UTF-8 - which >>> ought to be possible with a little bit of auxiliary indexing >>> in place of the UCS-2 representation - then a lot of that disappears. >>> >>> (And I repeat myself: your examples should be using byte arrays.) >> >> OK, I might be wrong. Can you write a cat(1) ans a grep(1) clone in >> tcl that outperforms perl? > > This might have come off as snarky, wasn't meant to be. I did tcl > versions of the above, can you make them faster? I tried to do byte > stuff with the fconfigure, is there some better way? > > cat.tcl: > proc cat {file} { > set f [open $file rb] > while {[gets $f buf] >= 0} { puts $buf } > close $f > } > fconfigure stdout -buffering full -translation binary > foreach file $argv { > cat $file > } For cat, you can be faster than perl - just use fcopy. Yes, it's a correct alternative, but only works for cat. > grep.tcl: > proc grep {file} { > set f [open $file rb] > set buf "" > while {[gets $f buf] >= 0} { > if {[regexp -- {[^A-Za-z]fopen\(.*\)} $buf]} { puts $buf } > } > close $f > } > fconfigure stdout -translation binary > foreach file $argv { > grep $file > } Step 1: use PCRE, get 3x speedup. Step 2 … maybe mark specially for non-utf-8 strings, and the speedup may be higher. Jeff |