Thread: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

The Tool Command Language implementation

Brought to you by: andreas_kupries, apnadkarni, bgriffin, das, and 10 others

tcl-core

[TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: George P. <pe...@ii...> - 2009-10-06 17:07:17

 TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND 
=========================================================================
 Version:      $Revision: 1.2 $
 Author:       George Petasis <petasis_at_iit.demokritos.gr>
 State:        Draft
 Type:         Project
 Tcl-Version:  8.7
 Vote:         Pending
 Created:      Sunday, 04 October 2009
 URL:          http://purl.org/tcl/tip/358.html
 WebEdit:      http://purl.org/tcl/tip/edit/358
 Post-History: 

-------------------------------------------------------------------------

 ABSTRACT 
==========

 The *split* command will create empty list elements when adjacent split 
 characters are found in the input. In some cases these empty list 
 elements are not desired, so this TIP proposes a new switch to disable 
 their generation. 

 RATIONALE 
===========

 The idea for this TIP came from a discussion in comp.lang.tcl: 
 [<URL:http://groups.google.gr/group/comp.lang.tcl/browse_thread/thread/8d46b0f10e7a5750/d7844cc739aa4310>] 
 and the (non obvious) suggestions on how tokens can be extracted from a 
 string can be performed efficiently. 

 It should be noted that this will allow the *split* command to be used 
 in a fashion that is very similar to how splitting works in many other 
 languages (e.g., Perl, awk, Unix shells). 

 SPECIFICATION 
===============

 This TIP proposes a new optional switch (*-noemptyelements*) to the 
 *split* command: 

       *split -noemptyelements* /string/ ?/splitChars/? 

 If this option is present, then *split* will not produce an empty list 
 element when the /string/ contains adjacent characters that are present 
 in /splitChars/. 

 REFERENCE IMPLEMENTATION 
==========================

 Currently there is no patch, but it should be quite easy to implement 
 this. 

 COPYRIGHT 
===========

 This document has been placed in the public domain. 

-------------------------------------------------------------------------

 TIP AutoGenerator - written by Donal K. Fellows

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Andreas K. <and...@ac...> - 2009-10-06 18:35:41

George Petasis wrote:
>  TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND 

Possible solutions without having to add options to a builtin ...

proc split-without-empty-elements-1 {list} {
	return [lsearch -all -inline -not [split $list] {}]
	# Thanks to spjuth.
	# This is 8.4+
}

# And for older Tcl's without the relevant lsearch options ...

proc split-without-empty-elements-3 {list} {
	set res {}
	foreach x [split $list] {
		if {$x eq {}} continue
		lappend res $x
	}
	return $res
}

proc split-without-empty-elements-2 {list} {
	package require struct::list
	return [struct::list filterfor x [split $list] {$x ne {}}]
}

-- 
Sincerely,
     Andreas Kupries <an...@ac...>
     Developer @    <http://www.activestate.com/>

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Georgios P. <pe...@ii...> - 2009-10-06 21:34:12

O/H Andreas Kupries έγραψε:
> George Petasis wrote:
>> TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND 
>
> Possible solutions without having to add options to a builtin ...
>
> proc split-without-empty-elements-1 {list} {
> return [lsearch -all -inline -not [split $list] {}]
> # Thanks to spjuth.
> # This is 8.4+
> }
>
> # And for older Tcl's without the relevant lsearch options ...
>
> proc split-without-empty-elements-3 {list} {
> set res {}
> foreach x [split $list] {
> if {$x eq {}} continue
> lappend res $x
> }
> return $res
> }
>
> proc split-without-empty-elements-2 {list} {
> package require struct::list
> return [struct::list filterfor x [split $list] {$x ne {}}]
> }
>
I agree that there are (less efficient) alternatives. But this does not 
change the fact that split does much less than it could.
I think that this relates to users frequently using implicit list 
conversion of a string (i.e. by using a string where a list is 
expected): Using split returns also unwanted elements...

George

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Donal K. F. <don...@ma...> - 2009-10-06 18:41:27

George Petasis wrote:
>  TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND 
> =========================================================================
[...]
>  This TIP proposes a new optional switch (*-noemptyelements*) to the 
>  *split* command: 
> 
>        *split -noemptyelements* /string/ ?/splitChars/? 

Would you like to comment what would happen to the currently-valid Tcl code:

   set s "-noemptyelements"
   set list [split $s {abcde}]

Donal.

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Georgios P. <pe...@ii...> - 2009-10-06 21:34:01

O/H Donal K. Fellows έγραψε:
> George Petasis wrote:
>>  TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT 
>> COMMAND 
>> ========================================================================= 
>>
> [...]
>>  This TIP proposes a new optional switch (*-noemptyelements*) to the 
>>  *split* command:
>>        *split -noemptyelements* /string/ ?/splitChars/? 
>
> Would you like to comment what would happen to the currently-valid Tcl 
> code:
>
>   set s "-noemptyelements"
>   set list [split $s {abcde}]
>
> Donal.
It will fail. I know it is difficult to add options afterwards to 
commands that were not designed to accept arguments.
Adding the usual -- parameter will not help here either...

An idea will be to require always 3 arguments if you want to use 
-noemptyelements (either a special -- after the -noemptyelements or the 
split chars)...

George

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Joe E. <jen...@fl...> - 2009-10-06 20:12:10

George Petasis wrote:

> TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND
> [...]
>  This TIP proposes a new optional switch (*-noemptyelements*) to the
>  *split* command:

I am strongly inclined to reject this TIP as currently specified.

If [split] doesn't work the way you want, you should use something
different that *does* do what you want (for example: [splitx] from tcllib),
not add a new "work-the-way-I-want" option to an existing command.

That's what happened to [lsearch] and [lsort].  Please let's
not perpetuate that practice.

--Joe English

  jen...@fl...

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Twylite <tw...@cr...> - 2009-10-06 21:59:23

George Petasis wrote:
>  This TIP proposes a new optional switch (*-noemptyelements*) to the 
>  *split* command: 
>
>        *split -noemptyelements* /string/ ?/splitChars/? 
>
>  If this option is present, then *split* will not produce an empty list 
>  element when the /string/ contains adjacent characters that are present 
>  in /splitChars/. 
>   
I don't think this functionality belongs in [split]; it would be better 
handled by a higher order function like filter:
  filter "ne {}" [split $string $splitchars]

Regards,
Twylite

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Jordan H. <jo...@jo...> - 2009-10-07 00:12:03

All,

I am inclined to agree with the filter recommendation.  If anything, split should behave in a 
symmetrical manner for composing an equivalent string.  The filter example clearly makes this an 
issue with the list and not the split function.  Further more, have a general purpose filter 
function is likely to be more useful than a 'noemptyelements' option added to split.

My two cents,
Jordan Henderson

On Tuesday 06 October 2009, Twylite wrote:
> George Petasis wrote:
> >  This TIP proposes a new optional switch (*-noemptyelements*) to the
> >  *split* command:
> >
> >        *split -noemptyelements* /string/ ?/splitChars/?
> >
> >  If this option is present, then *split* will not produce an empty list
> >  element when the /string/ contains adjacent characters that are present
> >  in /splitChars/.
>
> I don't think this functionality belongs in [split]; it would be better
> handled by a higher order function like filter:
>   filter "ne {}" [split $string $splitchars]
>
> Regards,
> Twylite
>
>
>
> ---------------------------------------------------------------------------
>--- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is
> the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> Tcl-Core mailing list
> Tcl...@li...
> https://lists.sourceforge.net/lists/listinfo/tcl-core

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Magentus <mag...@gm...> - 2009-10-07 07:21:29

Attachments: signature.asc

On Tue, 06 Oct 2009 23:43:37 +0200, 
Twylite <tw...@cr...> wrote:

> George Petasis wrote:
>>  This TIP proposes a new optional switch (*-noemptyelements*) to
>> the *split* command: 
>>        *split -noemptyelements* /string/ ?/splitChars/? 
>>  If this option is present, then *split* will not produce an empty
>> list element when the /string/ contains adjacent characters that
>> are present in /splitChars/. 
> I don't think this functionality belongs in [split]; it would be
> better handled by a higher order function like filter:
>   filter "ne {}" [split $string $splitchars]

[filter] is a bad idea, IMO...  It's going to need many of [search]s
options.  It'd be MUCH better to add a -command to [lsearch] similar to
[sort], to direct its matching.  (It can already do this job, anyhow...)

In any case, the problem with this is that you're splitting the entire
string into a list, and then throwing chunks of it away again.  In a
substantial sized string with a lot of padding characters, you can have
a LOT of little empty elements to iterate over.  The [string normalize]
idea is better, I think...  ([string collapse] kind of says something
else to me, though that could be my Glib history...)

I wonder, if it wouldn't be an idea to add a [string split] command,
with new functionality and more supportive syntax and migrate over to
that, as has been done with [chan] - this isn't the first time this
idea has been raised.  I know I for one would like to see a few simple
options added to [split]:

-defaults; adds the default spaces set to the specified split characters
set.  I've wanted to do that many times, and this brings to mind an
earlier discussion of which UTF-8 characters should be part of the
default spaces set.  [encoding sets] or something would be handy.

-noempty; the currently being discussed option.

-keepall; split BETWEEN (spans of, with -noempty) deliminator and
non-deliminator characters, instead of over them.  This can currently
be achieved with the somewhat heavier [regexp], but it would be trivial
for [split] to do this.

Specifically...  -noempty would cause it to skip all split characters,
instead of stepping over them one at a time.  In either case, you stash
away the character position first, and -keepall then simply reches back
and extracts the intervening characters as a list element.

The excellent thing with split, is that it's light-weight, fast, and
easy.  And none of this changes that.

-- 
Fredderic

Debian/unstable (LC#384816) on i686 2.6.30-1-686 2009 (up 1 day, 18:41)

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Tom J. <tom...@gm...> - 2009-10-06 23:03:54

On Tue, Oct 6, 2009 at 10:06 AM, George Petasis
<pe...@ii...> wrote:
>
>  TIP #358: SUPPRESS EMPTY LIST ELEMENT GENERATION FROM THE SPLIT COMMAND
> =========================================================================
>  Version:      $Revision: 1.2 $
>  Author:       George Petasis <petasis_at_iit.demokritos.gr>
>  State:        Draft
>  Type:         Project
>  Tcl-Version:  8.7
>  Vote:         Pending
>  Created:      Sunday, 04 October 2009
>  URL:          http://purl.org/tcl/tip/358.html
>  WebEdit:      http://purl.org/tcl/tip/edit/358
>  Post-History:
>
> -------------------------------------------------------------------------
>
>  ABSTRACT
> ==========
>
>  The *split* command will create empty list elements when adjacent split
>  characters are found in the input. In some cases these empty list
>  elements are not desired, so this TIP proposes a new switch to disable
>  their generation.
>
>  RATIONALE
> ===========
>
>  The idea for this TIP came from a discussion in comp.lang.tcl:
>  [<URL:http://groups.google.gr/group/comp.lang.tcl/browse_thread/thread/8d46b0f10e7a5750/d7844cc739aa4310>]
>  and the (non obvious) suggestions on how tokens can be extracted from a
>  string can be performed efficiently.
>
>  It should be noted that this will allow the *split* command to be used
>  in a fashion that is very similar to how splitting works in many other
>  languages (e.g., Perl, awk, Unix shells).
>
>  SPECIFICATION
> ===============
>
>  This TIP proposes a new optional switch (*-noemptyelements*) to the
>  *split* command:
>
>       *split -noemptyelements* /string/ ?/splitChars/?
>
>  If this option is present, then *split* will not produce an empty list
>  element when the /string/ contains adjacent characters that are present
>  in /splitChars/.

I think that [split] is best reserved for well formed inputs, in fact,
if the split chars are whitespace, then [split] does what most Tcl
programmers would consider to be the wrong thing...creating empty
elements between extra whitespace chars.

The solution could be something more generally useful: maybe
whitespace normalization?

We have [string trim], maybe something like [string normalize
(whitespace)]. The result would be a string where adjacent internal
whitespace chars are collapsed into one space char, and before and
after whitespace is eliminated. Then the problem would be solved like
this:

set mylist [split [string normalize $mystring]]

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Donald A. <as...@tr...> - 2009-10-07 06:33:26

Tom Jackson <tom...@gm...> writes:

> We have [string trim], maybe something like [string normalize
> (whitespace)]. The result would be a string where adjacent internal
> whitespace chars are collapsed into one space char,

I like the idea of [string collapse $xxx]

Not [split -nomumble $zzz] though.

-- 
Donald Arseneau                          as...@tr...

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Andreas K. <and...@ac...> - 2009-10-07 16:23:17

Tom Jackson wrote:
> We have [string trim], maybe something like [string normalize
> (whitespace)]. The result would be a string where adjacent internal
> whitespace chars are collapsed into one space char, and before and
> after whitespace is eliminated. Then the problem would be solved like
> this:
> 
> set mylist [split [string normalize $mystring]]

Oh, you mean

proc string-normalize {str} {
	return [regsub -all {\s+} $str { }]
}

-- 
Sincerely,
     Andreas Kupries <an...@ac...>
     Developer @    <http://www.activestate.com/>

Re: [TCLCORE] TIP #358: Suppress Empty List Element Generation from the Split Command

From: Andreas K. <and...@ac...> - 2009-10-07 17:03:41

Tom Jackson wrote:
> On Wed, Oct 7, 2009 at 9:19 AM, Andreas Kupries
> <and...@ac...> wrote:
>> Oh, you mean
>>
>> proc string-normalize {str} {
>>        return [regsub -all {\s+} $str { }]
>> }
>>
> 
> Close:
> 
> proc string-normalize {str} {
>     return [string trim [regsub -all {\s+} $str { }]]
> }

Ah, leading, trailing spaces. Right.

 > However, the [string] command would probably be much faster than using
 > [regsub], and I can think of a few additional problems and options
 > which complicate including this as a [string] subcommand. Basically I
 > wonder if it would be better in some cases to keep the first
 > whitespace char and remove additional ones (keeping tabbed data or

> line folded text to remain closer to the original). Also, how do you
> handle windows type newlines (crlf) which is two chars?

I assume that the channel used to reading the data was in auto-translation 
mode, which means I always see only LF. If not, it is no big deal to either 
extend the regsub, or run a [string map] before the regsub which normalizes 
that too.


-- 
Sincerely,
     Andreas Kupries <an...@ac...>
     Developer @    <http://www.activestate.com/>