From: Magentus <mag...@gm...> - 2009-10-07 07:21:29
|
On Tue, 06 Oct 2009 23:43:37 +0200, Twylite <tw...@cr...> wrote: > George Petasis wrote: >> This TIP proposes a new optional switch (*-noemptyelements*) to >> the *split* command: >> *split -noemptyelements* /string/ ?/splitChars/? >> If this option is present, then *split* will not produce an empty >> list element when the /string/ contains adjacent characters that >> are present in /splitChars/. > I don't think this functionality belongs in [split]; it would be > better handled by a higher order function like filter: > filter "ne {}" [split $string $splitchars] [filter] is a bad idea, IMO... It's going to need many of [search]s options. It'd be MUCH better to add a -command to [lsearch] similar to [sort], to direct its matching. (It can already do this job, anyhow...) In any case, the problem with this is that you're splitting the entire string into a list, and then throwing chunks of it away again. In a substantial sized string with a lot of padding characters, you can have a LOT of little empty elements to iterate over. The [string normalize] idea is better, I think... ([string collapse] kind of says something else to me, though that could be my Glib history...) I wonder, if it wouldn't be an idea to add a [string split] command, with new functionality and more supportive syntax and migrate over to that, as has been done with [chan] - this isn't the first time this idea has been raised. I know I for one would like to see a few simple options added to [split]: -defaults; adds the default spaces set to the specified split characters set. I've wanted to do that many times, and this brings to mind an earlier discussion of which UTF-8 characters should be part of the default spaces set. [encoding sets] or something would be handy. -noempty; the currently being discussed option. -keepall; split BETWEEN (spans of, with -noempty) deliminator and non-deliminator characters, instead of over them. This can currently be achieved with the somewhat heavier [regexp], but it would be trivial for [split] to do this. Specifically... -noempty would cause it to skip all split characters, instead of stepping over them one at a time. In either case, you stash away the character position first, and -keepall then simply reches back and extracts the intervening characters as a list element. The excellent thing with split, is that it's light-weight, fast, and easy. And none of this changes that. -- Fredderic Debian/unstable (LC#384816) on i686 2.6.30-1-686 2009 (up 1 day, 18:41) |