|
From: George P. <pe...@ii...> - 2009-10-06 17:07:17
|
TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND ========================================================================= Version: $Revision: 1.2 $ Author: George Petasis <petasis_at_iit.demokritos.gr> State: Draft Type: Project Tcl-Version: 8.7 Vote: Pending Created: Sunday, 04 October 2009 URL: http://purl.org/tcl/tip/358.html WebEdit: http://purl.org/tcl/tip/edit/358 Post-History: ------------------------------------------------------------------------- ABSTRACT ========== The *split* command will create empty list elements when adjacent split characters are found in the input. In some cases these empty list elements are not desired, so this TIP proposes a new switch to disable their generation. RATIONALE =========== The idea for this TIP came from a discussion in comp.lang.tcl: [<URL:http://groups.google.gr/group/comp.lang.tcl/browse_thread/thread/8d46b0f10e7a5750/d7844cc739aa4310>] and the (non obvious) suggestions on how tokens can be extracted from a string can be performed efficiently. It should be noted that this will allow the *split* command to be used in a fashion that is very similar to how splitting works in many other languages (e.g., Perl, awk, Unix shells). SPECIFICATION =============== This TIP proposes a new optional switch (*-noemptyelements*) to the *split* command: *split -noemptyelements* /string/ ?/splitChars/? If this option is present, then *split* will not produce an empty list element when the /string/ contains adjacent characters that are present in /splitChars/. REFERENCE IMPLEMENTATION ========================== Currently there is no patch, but it should be quite easy to implement this. COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows |
|
From: Andreas K. <and...@ac...> - 2009-10-06 18:35:41
|
George Petasis wrote:
> TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND
Possible solutions without having to add options to a builtin ...
proc split-without-empty-elements-1 {list} {
return [lsearch -all -inline -not [split $list] {}]
# Thanks to spjuth.
# This is 8.4+
}
# And for older Tcl's without the relevant lsearch options ...
proc split-without-empty-elements-3 {list} {
set res {}
foreach x [split $list] {
if {$x eq {}} continue
lappend res $x
}
return $res
}
proc split-without-empty-elements-2 {list} {
package require struct::list
return [struct::list filterfor x [split $list] {$x ne {}}]
}
--
Sincerely,
Andreas Kupries <an...@ac...>
Developer @ <http://www.activestate.com/>
|
|
From: Georgios P. <pe...@ii...> - 2009-10-06 21:34:12
|
O/H Andreas Kupries έγραψε:
> George Petasis wrote:
>> TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND
>
> Possible solutions without having to add options to a builtin ...
>
> proc split-without-empty-elements-1 {list} {
> return [lsearch -all -inline -not [split $list] {}]
> # Thanks to spjuth.
> # This is 8.4+
> }
>
> # And for older Tcl's without the relevant lsearch options ...
>
> proc split-without-empty-elements-3 {list} {
> set res {}
> foreach x [split $list] {
> if {$x eq {}} continue
> lappend res $x
> }
> return $res
> }
>
> proc split-without-empty-elements-2 {list} {
> package require struct::list
> return [struct::list filterfor x [split $list] {$x ne {}}]
> }
>
I agree that there are (less efficient) alternatives. But this does not
change the fact that split does much less than it could.
I think that this relates to users frequently using implicit list
conversion of a string (i.e. by using a string where a list is
expected): Using split returns also unwanted elements...
George
|
|
From: Donal K. F. <don...@ma...> - 2009-10-06 18:41:27
|
George Petasis wrote:
> TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND
> =========================================================================
[...]
> This TIP proposes a new optional switch (*-noemptyelements*) to the
> *split* command:
>
> *split -noemptyelements* /string/ ?/splitChars/?
Would you like to comment what would happen to the currently-valid Tcl code:
set s "-noemptyelements"
set list [split $s {abcde}]
Donal.
|
|
From: Georgios P. <pe...@ii...> - 2009-10-06 21:34:01
|
O/H Donal K. Fellows έγραψε:
> George Petasis wrote:
>> TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT
>> COMMAND
>> =========================================================================
>>
> [...]
>> This TIP proposes a new optional switch (*-noemptyelements*) to the
>> *split* command:
>> *split -noemptyelements* /string/ ?/splitChars/?
>
> Would you like to comment what would happen to the currently-valid Tcl
> code:
>
> set s "-noemptyelements"
> set list [split $s {abcde}]
>
> Donal.
It will fail. I know it is difficult to add options afterwards to
commands that were not designed to accept arguments.
Adding the usual -- parameter will not help here either...
An idea will be to require always 3 arguments if you want to use
-noemptyelements (either a special -- after the -noemptyelements or the
split chars)...
George
|
|
From: Joe E. <jen...@fl...> - 2009-10-06 20:12:10
|
George Petasis wrote: > TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND > [...] > This TIP proposes a new optional switch (*-noemptyelements*) to the > *split* command: I am strongly inclined to reject this TIP as currently specified. If [split] doesn't work the way you want, you should use something different that *does* do what you want (for example: [splitx] from tcllib), not add a new "work-the-way-I-want" option to an existing command. That's what happened to [lsearch] and [lsort]. Please let's not perpetuate that practice. --Joe English jen...@fl... |
|
From: Twylite <tw...@cr...> - 2009-10-06 21:59:23
|
George Petasis wrote:
> This TIP proposes a new optional switch (*-noemptyelements*) to the
> *split* command:
>
> *split -noemptyelements* /string/ ?/splitChars/?
>
> If this option is present, then *split* will not produce an empty list
> element when the /string/ contains adjacent characters that are present
> in /splitChars/.
>
I don't think this functionality belongs in [split]; it would be better
handled by a higher order function like filter:
filter "ne {}" [split $string $splitchars]
Regards,
Twylite
|
|
From: Jordan H. <jo...@jo...> - 2009-10-07 00:12:03
|
All,
I am inclined to agree with the filter recommendation. If anything, split should behave in a
symmetrical manner for composing an equivalent string. The filter example clearly makes this an
issue with the list and not the split function. Further more, have a general purpose filter
function is likely to be more useful than a 'noemptyelements' option added to split.
My two cents,
Jordan Henderson
On Tuesday 06 October 2009, Twylite wrote:
> George Petasis wrote:
> > This TIP proposes a new optional switch (*-noemptyelements*) to the
> > *split* command:
> >
> > *split -noemptyelements* /string/ ?/splitChars/?
> >
> > If this option is present, then *split* will not produce an empty list
> > element when the /string/ contains adjacent characters that are present
> > in /splitChars/.
>
> I don't think this functionality belongs in [split]; it would be better
> handled by a higher order function like filter:
> filter "ne {}" [split $string $splitchars]
>
> Regards,
> Twylite
>
>
>
> ---------------------------------------------------------------------------
>--- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is
> the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> Tcl-Core mailing list
> Tcl...@li...
> https://lists.sourceforge.net/lists/listinfo/tcl-core
|
|
From: Magentus <mag...@gm...> - 2009-10-07 07:21:29
Attachments:
signature.asc
|
On Tue, 06 Oct 2009 23:43:37 +0200,
Twylite <tw...@cr...> wrote:
> George Petasis wrote:
>> This TIP proposes a new optional switch (*-noemptyelements*) to
>> the *split* command:
>> *split -noemptyelements* /string/ ?/splitChars/?
>> If this option is present, then *split* will not produce an empty
>> list element when the /string/ contains adjacent characters that
>> are present in /splitChars/.
> I don't think this functionality belongs in [split]; it would be
> better handled by a higher order function like filter:
> filter "ne {}" [split $string $splitchars]
[filter] is a bad idea, IMO... It's going to need many of [search]s
options. It'd be MUCH better to add a -command to [lsearch] similar to
[sort], to direct its matching. (It can already do this job, anyhow...)
In any case, the problem with this is that you're splitting the entire
string into a list, and then throwing chunks of it away again. In a
substantial sized string with a lot of padding characters, you can have
a LOT of little empty elements to iterate over. The [string normalize]
idea is better, I think... ([string collapse] kind of says something
else to me, though that could be my Glib history...)
I wonder, if it wouldn't be an idea to add a [string split] command,
with new functionality and more supportive syntax and migrate over to
that, as has been done with [chan] - this isn't the first time this
idea has been raised. I know I for one would like to see a few simple
options added to [split]:
-defaults; adds the default spaces set to the specified split characters
set. I've wanted to do that many times, and this brings to mind an
earlier discussion of which UTF-8 characters should be part of the
default spaces set. [encoding sets] or something would be handy.
-noempty; the currently being discussed option.
-keepall; split BETWEEN (spans of, with -noempty) deliminator and
non-deliminator characters, instead of over them. This can currently
be achieved with the somewhat heavier [regexp], but it would be trivial
for [split] to do this.
Specifically... -noempty would cause it to skip all split characters,
instead of stepping over them one at a time. In either case, you stash
away the character position first, and -keepall then simply reches back
and extracts the intervening characters as a list element.
The excellent thing with split, is that it's light-weight, fast, and
easy. And none of this changes that.
--
Fredderic
Debian/unstable (LC#384816) on i686 2.6.30-1-686 2009 (up 1 day, 18:41)
|
|
From: Tom J. <tom...@gm...> - 2009-10-06 23:03:54
|
On Tue, Oct 6, 2009 at 10:06 AM, George Petasis <pe...@ii...> wrote: > > TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND > ========================================================================= > Version: $Revision: 1.2 $ > Author: George Petasis <petasis_at_iit.demokritos.gr> > State: Draft > Type: Project > Tcl-Version: 8.7 > Vote: Pending > Created: Sunday, 04 October 2009 > URL: http://purl.org/tcl/tip/358.html > WebEdit: http://purl.org/tcl/tip/edit/358 > Post-History: > > ------------------------------------------------------------------------- > > ABSTRACT > ========== > > The *split* command will create empty list elements when adjacent split > characters are found in the input. In some cases these empty list > elements are not desired, so this TIP proposes a new switch to disable > their generation. > > RATIONALE > =========== > > The idea for this TIP came from a discussion in comp.lang.tcl: > [<URL:http://groups.google.gr/group/comp.lang.tcl/browse_thread/thread/8d46b0f10e7a5750/d7844cc739aa4310>] > and the (non obvious) suggestions on how tokens can be extracted from a > string can be performed efficiently. > > It should be noted that this will allow the *split* command to be used > in a fashion that is very similar to how splitting works in many other > languages (e.g., Perl, awk, Unix shells). > > SPECIFICATION > =============== > > This TIP proposes a new optional switch (*-noemptyelements*) to the > *split* command: > > *split -noemptyelements* /string/ ?/splitChars/? > > If this option is present, then *split* will not produce an empty list > element when the /string/ contains adjacent characters that are present > in /splitChars/. I think that [split] is best reserved for well formed inputs, in fact, if the split chars are whitespace, then [split] does what most Tcl programmers would consider to be the wrong thing...creating empty elements between extra whitespace chars. The solution could be something more generally useful: maybe whitespace normalization? We have [string trim], maybe something like [string normalize (whitespace)]. The result would be a string where adjacent internal whitespace chars are collapsed into one space char, and before and after whitespace is eliminated. Then the problem would be solved like this: set mylist [split [string normalize $mystring]] |
|
From: Donald A. <as...@tr...> - 2009-10-07 06:33:26
|
Tom Jackson <tom...@gm...> writes: > We have [string trim], maybe something like [string normalize > (whitespace)]. The result would be a string where adjacent internal > whitespace chars are collapsed into one space char, I like the idea of [string collapse $xxx] Not [split -nomumble $zzz] though. -- Donald Arseneau as...@tr... |
|
From: Andreas K. <and...@ac...> - 2009-10-07 16:23:17
|
Tom Jackson wrote:
> We have [string trim], maybe something like [string normalize
> (whitespace)]. The result would be a string where adjacent internal
> whitespace chars are collapsed into one space char, and before and
> after whitespace is eliminated. Then the problem would be solved like
> this:
>
> set mylist [split [string normalize $mystring]]
Oh, you mean
proc string-normalize {str} {
return [regsub -all {\s+} $str { }]
}
--
Sincerely,
Andreas Kupries <an...@ac...>
Developer @ <http://www.activestate.com/>
|
|
From: Andreas K. <and...@ac...> - 2009-10-07 17:03:41
|
Tom Jackson wrote:
> On Wed, Oct 7, 2009 at 9:19 AM, Andreas Kupries
> <and...@ac...> wrote:
>> Oh, you mean
>>
>> proc string-normalize {str} {
>> return [regsub -all {\s+} $str { }]
>> }
>>
>
> Close:
>
> proc string-normalize {str} {
> return [string trim [regsub -all {\s+} $str { }]]
> }
Ah, leading, trailing spaces. Right.
> However, the [string] command would probably be much faster than using
> [regsub], and I can think of a few additional problems and options
> which complicate including this as a [string] subcommand. Basically I
> wonder if it would be better in some cases to keep the first
> whitespace char and remove additional ones (keeping tabbed data or
> line folded text to remain closer to the original). Also, how do you
> handle windows type newlines (crlf) which is two chars?
I assume that the channel used to reading the data was in auto-translation
mode, which means I always see only LF. If not, it is no big deal to either
extend the regsub, or run a [string map] before the regsub which normalizes
that too.
--
Sincerely,
Andreas Kupries <an...@ac...>
Developer @ <http://www.activestate.com/>
|