On 15 Jul 2009 18:19:07 -0700,
Donald Arseneau <asnd@...> wrote:
> Fredderic <magentus@...> writes:
>> I don't understand that... Given the following line:
>>
>> lsearch -all -stride 3 -start 1 \
>> {junk abc def ghi 123 456 789} {*[e8]*}
>>
>> I want lsearch to return {1 4}, you seem to want it to return {2 6}.
>
> No! I expect it to return {2}. lsearch without -stride returns {2 6}
> but when given "-stride 3 -start 1" lsearch should only ever look at
> items 1,4,7,... and should not match any other elements.
Yes, if you specify "-index 0", then that is exactly what it would do.
And if you specify "-index 1", it would be checking items 2, 5, 8,
etc. My point, is that if you don't specify an index, then it should do
exactly the same as it does any other time it's faced with a sub-list
and no search index; it should match the entire sub-list (stride group
in this case) as a string. To do anything else, other than throw an
error because there's no -index term provided (which would be an
unfortunate omission of functionality), is an inconsistency with other
existing behaviours of the command.
>> Given {1 4} as a return value, I can directly access any element
>> within the indicated group(s). Without the -all, the result will
>> be 1, and regardless of which word in the group matched (eg. using
>> the pattern *[b8]* instead),
>
> What you are looking for is not a stride of 3, but some "funky modulo
> math" (actually, divide and multiply) appled to a regular [lsearch]
> result. I don't see how that merits any change to the lsearch command
> itself. And how much use would such whole-group searches be? When
> you have a flat key-value list you usually want to search the keys or
> the values, not both indiscriminantly.
No. I am not looking for some "funky modulo math". [lsearch] I
suspect, will be stepping over the list in units of -stride already.
It doesn't have to do any funky modulo math to figure out the result I
want, it already has it as part of its basic functionality. But if it
returns the result you prescribe, than the script may have to do that
funky modulo math to get back what [lsearch] already had. Let me
repeat that, because it's important; [lsearch] already HAS the result
of the "funky modulo math" as the basis of its most obvious
implementation of this feature with the capabilities suggested here.
As for the second part, strangely, I have several instances of matching
against entire sub-lists as a string. One prime example, is
three-element sub-list, all three elements are quite distinctive, and
so any given search pattern will only match one of the three elements.
So rather than fiddling with -index, a simple search against the entire
sub-list does quite well. That would work just the same if it were a
flat list with "-stride 3" and no -index specifier.
Is it a common situation? Probably not. But it IS part of TCL's
legacy. It's been the way in the other well-known context, and it
should be the way here, too. In TCL, EIAS, after all. Isn't that
what's always said? What sense, then, is blocking a function because we
don't want to consider something as a string?
In the interim, though, I do recognise that the naive approach is going
to incur a significant performance penalty, and clutter the
implementation to a slight degree. I would hope, though, that this
situation is handled by a temporary error on missing -index specifier,
and that the TIP states this is an omission to be remedied at a point
not too many releases thereafter.
> We really need a more detailed TIP to eliminate these discussions at
> cross-purposes.
I don't see any cross-purpose going on at all. You don't want to
consider the grouping as a whole, as a potential search target, where I
believe it's just common sense to do so, even if it comes with a
performance warning on large lists. There may well be a change at some
point (such as the move to ropes) that removes that performance issue,
and there may well come a hitherto unrealised bright idea at some point
that makes to trivial and just as quick as the rest. Treating sub-lists
as a string is part of TCL. The result of [lrange] is used as a string
quite regularly around the place. Deal with it.
What you're proposing, is like saying that [lsort] should only sort
every -stride'th element, and leave the rest where they were. ie.
lsort -stride 3 -index 0 {9 8 7 6 5 4 3 2 1 0}
-> {0 8 7 3 5 4 6 2 1 9}
Now, that looks a little strange to me. Don't you think?
>> Given your response of {2 6},
>
> not
With the assumption that the feature goes ahead despite your
reservations, that IS the result you were suggesting, as far as I can
determine. What I wasn't considering at the time, because it seemed a
little odd given my understanding of TCL's very essense, was that you
were proposing omitting that obvious feature all together.
>> in order to do any of that I need to do a bit of funky modulo math
>> to adjust the index back to the start of the group, and then add the
>> offset to the item and/or group that I want.
>
> No modulo math. You are talking about a searching all elements
> in the group, and I don't think that is what the TIP proposes.
> We need a clearer TIP.
I'll repeat my stance yet again: [lsearch] already treats sub-lists as
strings when searching. It doesn't recognise the item is a list, and
say, "oh, I'm only going to match the first word of this list!". Which
is what you're proposing; we have a group of items, so we'll only match
against the first item in the group and ignoring the rest.
I *DO* however recognise that this will require special case handling,
at least until someone comes up with a better idea. Implementation
wise, the naive way shouldn't complicate things very much, but it will
be slow. It would involve looping over the -stride items from the
current position, composing a list, converting that list into a string,
and taking that as the value to match against. This would be
implemented as an alternative to the line that already obtains the list
items string to match against.
Doing the task fast is also possible, but does involve some extra
complexity and possibly rearrangement of the search implementation. I'm
hoping someone finds a better middle-ground, but regardless, the slow
way is evil performance wise, but I think it's trivial enough that it
should be available as an option where its performance is acceptable.
>>> (We should have TIP#176 to simplify extraction of "strident
>>> groups"!)
>> Probably... An expr function to calculate an index given an offset
>> striding, might be handy... Something like:
>
> Actually, #176 got implemented for 8.6, so extracting a group
> is a simple [lrange] command without any expr, multiplication,
> division, or modulo.
Yes, but finding the group isn't! At the very least, index arithmetic
would need to be extended to allow multiple additions/subtractions so
you can remove the -start offset prior to adding the offset into the
group that you're wanting. And it comes in many cases with a small but
not irrelevant increase in script complexity, and at least three
variable accesses that weren't needed previously. The whole point of
adding index arithmetic is to make things simpler, and the whole point
of -stride without -index matching against the whole group of items is
exactly the same, making things simpler, and taking complexity out of
the script.
>>> Assuming $index is the index into sub-lists, and the index of
>>> the actual matching element were returned, it would be
>>> set first [lsearch -index $index -start 1 -stride 2]
>>> set second [lsearch -index $index -start $first -stride 2]
>> No, it wouldn't, because the second search is being started on an
>> item that already matches the search, and so the exact same item
>> will be found twice. You need to adjust the first returned index,
>> prior to the second search.
>
> Ooops, sorry! The same for the existing lsearch, requiring an
> (ugly) [expr $previous+1] or [incr previous], so the striding
> lsearch (as I see it) would need [incr previous 2] (for -stride 2).
That is the basis for my wish to add -stride scaling to the index
arithmetic. You could then simply say "-start $previous+(1)", or
something along those lines, with the "(1)" being interpreted as
"multiply 1 by the -stride value".
> # a key,value,value list:
>
> set kvv [list key1 A1 B1 key2 A2 B2 key3 A3 B3 key4 A4 B4 keyn An Bn
> key12 A12 B12]
>
> # search for "2" in A entries (only!):
>
> set match [lseach -stride 3 -start 1 $kvv {*2*}]
> if { $match <0 } { return ... }
> puts "found group [lrange $kvv $match-1 $match+1]"
> set another [lseach -stride 3 -start [incr match 3] $kvv {*2*}]
> if { $another>=0 } {
> puts "found another group [lrange $kvv $another-1 $another+1]"
> }
>
> should report:
>
> found group key2 A2 B2
> found another group key12 A12 B12
Yeah. I agree, sans the odd typo. Except that there should be an
"-index 0" added to your [lsearch] command. OR, use "-index 1" instead
of the "-start 1".
That doesn't address in any way, however, the whole argument I've been
making all this time.
>> that matches the entire item group, as in the present case of
>> searching sub-lists with a regular string match.
>
> Good question as to whether that is the "present case"! I would edit
> the TIP myself to specify otherwise, but maybe Donal's intention was
> to search entire sublists! I don't think so because lsort uses the
> first (zeroth) element of each group or the one given by -index;
> never the whole group.
I believe that is an error, personally. [lsort] in its present state
should throw an error if -index isn't given, or add this very same
functionality. As it stands, "-index 0" in [lsort] is "default
behaviour", and should either be made explicit, or generalised to the
natural extent, which is what I'm talking about here for [lsearch].
I'd hazard to say that it wouldn't be any harder to implement for
[lsearch], although the performance impact of the naive approach would
be compounded due to the fact [lsort] quite commonly compares any given
item multiple times during the sort. So I'd personally recommend
amending it to insist that -index be given when -stride is used.
--
Fredderic
Debian/unstable (LC#384816) on i686 2.6.29-2-686 2009 (up 4 days, 2:57)
|