Thread: [Tcllib-devel] Asking for comments on a ncgi patch (regexp -> string last/...)

Brought to you by: andreas_kupries, dev_null42a, ericm, hobbs, and 6 others

tcllib-devel

[Tcllib-devel] Asking for comments on a ncgi patch (regexp -> string last/...)

From: Andreas K. <and...@ac...> - 2006-08-08 18:22:27

Is the change below something the Tcl RE bytecompiler does automagically ?
Could it be done ?


https://sourceforge.net/tracker/?func=detail&atid=112883&aid=1536890&group_id=12
883

	Category: ncgi
	Submitted By: yahalom emet (yahalom)
	Summary: regexp usage is inefficient and can hang

	using regexp in ::ncgi::nvlist is slower that the
	usage of string methods (although it looks better).
	regexp also causes problems with bigger data which can
	cause ncgi to get stuck (try running it on big post
	data).

	replace:

	if {![regexp -- (.*)=(.*) $x dummy varname val]} {
          set varname anonymous
          set val $x
      }


	with:

	set idx [string last "=" $x]
	if {$idx==-1} {
	   set varname anonymous
	  set val $x
	} else {
	    set varname [string range $x 0 [expr {$idx-1}]]
	    set val [string range $x [expr {$idx+1}] end]
	}


--
	Andreas Kupries <and...@Ac...>
	Developer @ http://www.ActiveState.com
	Tel: +1 778-786-1122

Re: [Tcllib-devel] Asking for comments on a ncgi patch (regexp ->string last/...)

From: Jeff H. <je...@ac...> - 2006-08-08 21:45:45

Andreas Kupries wrote:
> Is the change below something the Tcl RE bytecompiler does=20
> automagically ? Could it be done ?
>=20
>
https://sourceforge.net/tracker/?func=3Ddetail&atid=3D112883&aid=3D153689=
0&group_id=3D
12883
>=20
> 	using regexp in ::ncgi::nvlist is slower that the
> 	usage of string methods (although it looks better).
> 	regexp also causes problems with bigger data which can
> 	cause ncgi to get stuck (try running it on big post
> 	data).

There are a lot of cases where 'string' methods trump REs for speed.  =
This is
not a case that the bytecompiler tries to outsmart the user - it doesn't =
do
that for any capturing REs.  It would be nice if the RE handled this =
better.
I'm fairly certain that someone with strong determination and strong =
will (to
not go crazy ;) ) would find that the RE code is not efficient for many =
simple
cases.  Unfortunately I don't have the time for this review.  :(

Jeff

> 	replace:
>=20
> 	if {![regexp -- (.*)=3D(.*) $x dummy varname val]} {
>           set varname anonymous
>           set val $x
>       }
>=20
> 	with:
>=20
> 	set idx [string last "=3D" $x]
> 	if {$idx=3D=3D-1} {
> 	   set varname anonymous
> 	  set val $x
> 	} else {
> 	    set varname [string range $x 0 [expr {$idx-1}]]
> 	    set val [string range $x [expr {$idx+1}] end]
> 	}

Re: [Tcllib-devel] Asking for comments on a ncgi patch (regexp -> string last/...)

From: Arjen M. <arj...@wl...> - 2006-08-09 06:30:05

Andreas Kupries wrote:

>Is the change below something the Tcl RE bytecompiler does automagically ?
>Could it be done ?
>
>
>https://sourceforge.net/tracker/?func=detail&atid=112883&aid=1536890&group_id=12
>883
>
>	Category: ncgi
>	Submitted By: yahalom emet (yahalom)
>	Summary: regexp usage is inefficient and can hang
>
>	using regexp in ::ncgi::nvlist is slower that the
>	usage of string methods (although it looks better).
>	regexp also causes problems with bigger data which can
>	cause ncgi to get stuck (try running it on big post
>	data).
>
>	replace:
>
>	if {![regexp -- (.*)=(.*) $x dummy varname val]} {
>          set varname anonymous
>          set val $x
>      }
>
>
>	with:
>
>	set idx [string last "=" $x]
>	if {$idx==-1} {
>	   set varname anonymous
>	  set val $x
>	} else {
>	    set varname [string range $x 0 [expr {$idx-1}]]
>	    set val [string range $x [expr {$idx+1}] end]
>	}
>  
>
Shouldn't this be [string first]? That solution is better IMHO than the 
[regex] one:
Consider a string like:
    x="y=z"

(if that is not possible with post data, then I plead almost complete
ignorance of the details of CGI ...)

The regular expression would split this in: x="y and z" (as would 
[string last])

I doubt that is what you want, but then again, I am more than slightly 
ignorant
of the issues of CGI.

Regards,

Arjen

Re: [Tcllib-devel] Asking for comments on a ncgi patch (regexp -> string last/...)

From: Jeff H. <je...@ac...> - 2006-08-09 08:22:39

Arjen Markus wrote:
>> 	if {![regexp -- (.*)=(.*) $x dummy varname val]} {
>>          set varname anonymous
>>          set val $x
>>      }

> Shouldn't this be [string first]? That solution is better IMHO than the 
> [regex] one:
> Consider a string like:
>     x="y=z"

Correct or not, the regexp is string last because it is greedy.  I do 
believe that the example is correct for CGI, where the quoted string you 
  gave would not be legal (it would be encoded).

Jeff

[Tcllib-devel] Distributed computing anyone?

From: Arjen M. <arj...@wl...> - 2006-09-06 09:47:49

Hello all,

Michael Schlenker and I were just discussing a possible new are of interest
in Tcllib: distributed computing.

First of all, let me explain what the two of us mean by "distributed 
computing":
Two or more programs (either running on the same computer or on different
computers) share information, so that they can both progress in their task.

Examples:
- The tkchat is such a system - a whole bunch of people talking to each 
other.
- SOAP is a well-known protocol that allows programs (processes) to share
  information.
- GRID computing where large-scale computational programs running
  on all kinds of computers cooperate to predict the global climate for 
the next
  100 years.

Mind you: we are not out to create something that will encompass all such
forms of distributed computing!

Let me outline our intended audience:
The average programmer knows of multithreading and multiprocessing
as powerful techniques to enhance the performance of their programs.
Or they know of client-server techniques to allow several people to
use their system. Multithreading is notoriously hard to get right.
Client-server systems are much easier - if you use Tcl :).

The programmers and users we envision do not have massive
computer networks for doing their job. Just a couple of boxes that
could be used in some biggish computation but right now it is too much
work to get them cooperating.

What if we had a framework where such people could plug in their
various programs with only little adaptations. Tcl and Tcllib have
a lot of tools for doing this. So all that is needed is a bit of 
infrastructure.

Given the variety of tools  (secure  connections or not for instance), that
infrastructure could be tuned to their needs with just a few switches.

I propose to discuss either on this list or on the Wiki the outlines of
such an infrastructure and to go ahead and assemble it from the tools
we have when there is enough clarity about it.

Regards,

Arjen

Re: [Tcllib-devel] Distributed computing anyone?

From: Cameron L. <Ca...@ph...> - 2006-09-06 11:11:21

On Wed, Sep 06, 2006 at 11:47:38AM +0200, Arjen Markus wrote:
			.
			.
			.
> The average programmer knows of multithreading and multiprocessing
> as powerful techniques to enhance the performance of their programs.
			.
			.
			.
Uttely tangential remark, that doesn't reflect at all my positive
interest in AM's project:  the average programmer is dangerously
ignorant of the propensity of multithreading to degrade and pervert
the performance of his programs.

Re: [Tcllib-devel] Distributed computing anyone?

From: Andreas K. <and...@ac...> - 2006-09-08 18:58:29

> Hello all,
>
> Michael Schlenker and I were just discussing a possible new are of interest
> in Tcllib: distributed computing.
>
> First of all, let me explain what the two of us mean by "distributed
> computing":
> Two or more programs (either running on the same computer or on different
> computers) share information, so that they can both progress in their task.

> Examples:
> - The tkchat is such a system - a whole bunch of people talking to each
> other.
> - SOAP is a well-known protocol that allows programs (processes) to share
>   information.
> - GRID computing where large-scale computational programs running
>   on all kinds of computers cooperate to predict the global climate for
> the next
>   100 years.

Note jcw's "Tequila" for shared arrays.
Note further Tcllib's "tie" (rarray backend) for doing same (foundation:
Tcllib's "comm").

> Mind you: we are not out to create something that will encompass all such
> forms of distributed computing!
>
> Let me outline our intended audience:
> The average programmer knows of multithreading and multiprocessing
> as powerful techniques to enhance the performance of their programs.
> Or they know of client-server techniques to allow several people to
> use their system. Multithreading is notoriously hard to get right.
> Client-server systems are much easier - if you use Tcl :).

In Tcl multi-threaded can be seen as client-server system running within a
single process (the apartment-model + thread::send).

> The programmers and users we envision do not have massive
> computer networks for doing their job. Just a couple of boxes that
> could be used in some biggish computation but right now it is too much
> work to get them cooperating.
>
> What if we had a framework where such people could plug in their
> various programs with only little adaptations. Tcl and Tcllib have
> a lot of tools for doing this. So all that is needed is a bit of
> infrastructure.

"comm" is the main infrastructure IMHO. The main thing will be to define how
tasks are communicated, results etc. That might be specific to the system used.
Security also becomes a much biger issue than before ... Reminds me, wanted to
write something to allow use of "tls::socket" in "comm".

> Given the variety of tools  (secure  connections or not for instance), that
> infrastructure could be tuned to their needs with just a few switches.

> I propose to discuss either on this list or on the Wiki the outlines of
> such an infrastructure and to go ahead and assemble it from the tools
> we have when there is enough clarity about it.

I have no trouble to have it discussed here.

It fits a bit with the (now defered) name-service thingy. The name server can be
used by the various pieces of the distributed system to find each other. Note
also Apple Bonjour/Rendezvous, it does a similar thing.


--
	Andreas Kupries <and...@Ac...>
	Developer @ http://www.ActiveState.com
	Tel: +1 778-786-1122

Re: [Tcllib-devel] Distributed computing anyone?

From: Arjen M. <arj...@wl...> - 2006-09-11 07:16:25

Andreas Kupries wrote:

>Note jcw's "Tequila" for shared arrays.
>Note further Tcllib's "tie" (rarray backend) for doing same (foundation:
>Tcllib's "comm").
>
>  
>
>
>"comm" is the main infrastructure IMHO. The main thing will be to define how
>tasks are communicated, results etc. That might be specific to the system used.
>Security also becomes a much biger issue than before ... Reminds me, wanted to
>write something to allow use of "tls::socket" in "comm".
>
>  
>
I agree with that, having now read the man page on comm.

>>Given the variety of tools  (secure  connections or not for instance), that
>>infrastructure could be tuned to their needs with just a few switches.
>>    
>>
>
>  
>
>>I propose to discuss either on this list or on the Wiki the outlines of
>>such an infrastructure and to go ahead and assemble it from the tools
>>we have when there is enough clarity about it.
>>    
>>
>
>I have no trouble to have it discussed here.
>
>It fits a bit with the (now defered) name-service thingy. The name server can be
>used by the various pieces of the distributed system to find each other. Note
>also Apple Bonjour/Rendezvous, it does a similar thing.
>  
>
What I am pondering about at the moment is the fact that various 
problems require very
different working methods:
- In some cases a client program will just happily send off independent 
tasks and collect the
  results as they come in
- In other cases the client program will want to use this facility to 
speed up a computation
  that is essentially an iteration (i.e. the ordering of the results and 
the computational steps
  are fixed).
- Other properties that may vary: the level of security, the 
availability of computational
  resources (local to the machine that runs the server or various worker 
processes on
  whatever machines are available).

Ideally things would be completely transparant to the client program - 
just a few details
at start-up for instance with sensible defaults and the possibility to 
tune them if needed.

I would say that by considering several examples we should be able to 
come up with
some sort of easy-to-use API. And we do not need to cover all 
possibilities after all.

Regards,

Arjen