Thread: [tclwebtest] Bug in matching syntax of "link follow" and relative url construction
Status: Abandoned
Brought to you by:
tils
From: Grzegorz A. H. <gr...@ef...> - 2003-01-29 17:20:45
|
Hi. Looks like the absence of a selftest for "link follow ~c" has let this bug survive, the search is done in the full html instead of just the content. Besides, look at how the double slash confuses tclwebtest, it is a shorthand to avoid typing "http:", but tclwebtest constructs an incorrect relative url. [gradha@ws5:0] [~/tclwebtest]$ ./tclwebtest cus.txt ----- START: cus.txt at [29/ene/2003:18:14:07] ----- --- do_request for http://slashdot.org/topics.shtml http status: >>200<< --- do_request for http://slashdot.org/ http status: >>200<< <A HREF="//slashdot.org/search.pl?topic=126"><IMG SRC="//images.slashdot.org/topics/topictech2.gif" WIDTH="60" HEIGHT="80" BORDER="0" ALT="Technology"></A> --- do_request for http://slashdot.org//slashdot.org/search.pl?topic=126 http status: >>404<< do_request did not return a page. HTTP status is 404 in "cus.txt" line 8: link find ~c "topics" log [link get_full] link follow ~c "topics" ----- FAILED: cus.txt (took 40s) ----- DURATION: 40 1 of 1 tests FAILED: cus.txt -- Grzegorz Adam Hankiewicz, gr...@ef.... Tel: +34-94-472 35 89. eFaber SL, Maria Diaz de Haro, 68, 2 http://www.efaber.net/ 48920 Portugalete, Bizkaia (SPAIN) |
From: Tilmann S. <ti...@ti...> - 2003-01-29 17:27:01
|
Hi Grzegorz, Thanks for the mass of patches and bug reports. There's a backlog of mails from you in my inbox - I just wanted to tell you that I'm quite busy right now with paid stuff so that I'll not be able to apply them before next week, hope it can wait that long. Keep em comin' ;) cheers, Til -- http://tsinger.com |
From: Grzegorz A. H. <gr...@ef...> - 2003-01-29 17:43:27
|
On Wed, Jan 29, 2003 at 05:20:13PM +0000, Tilmann Singer wrote: > [...] I'll not be able to apply them before next week, hope it > can wait that long. Don't worry, I use a local CVS now, it will be a matter of syncing the copies when you apply them. -- Grzegorz Adam Hankiewicz, gr...@ef.... Tel: +34-94-472 35 89. eFaber SL, Maria Diaz de Haro, 68, 2 http://www.efaber.net/ 48920 Portugalete, Bizkaia (SPAIN) |
From: Grzegorz A. H. <gr...@ef...> - 2003-01-31 11:38:54
|
On Wed, Jan 29, 2003 at 06:17:40PM +0100, Grzegorz Adam Hankiewicz wrote: > Besides, look at how the double slash confuses tclwebtest, it is > a shorthand to avoid typing "http:", but tclwebtest constructs an > incorrect relative url. This patch fixes the incorrect relative url construction with web pages like http://slashdot.org/. Index: tclwebtest.tcl =================================================================== RCS file: /home/maincvs/efintranet/www/tclwebtest/prog/tclwebtest.tcl,v retrieving revision 1.15 diff -u -r1.15 tclwebtest.tcl --- tclwebtest.tcl 31 Jan 2003 09:11:27 -0000 1.15 +++ tclwebtest.tcl 31 Jan 2003 11:28:11 -0000 @@ -2042,6 +2042,10 @@ same url again not supported return $::tclwebtest::url + } elseif { [string range $url 0 1] == "//" } { + # append protocol + regexp {(https?:).*} $::tclwebtest::url match protocol + return "$protocol$url" } elseif { [string range $url 0 0] == "/" } { # append host regexp {(https?://[^/]+)} $::tclwebtest::url match host_part Ignore the above line "same url again not supported", I have put it there temporarily in my copy to avoid crashing the server with infinite redirections. Looks like AOLserver doesn't give much memory space/stack to TCL, because running from the commandline I can endlessly watch the self redirection bug for minutes, until I get bored, of course. -- Grzegorz Adam Hankiewicz, gr...@ef.... Tel: +34-94-472 35 89. eFaber SL, Maria Diaz de Haro, 68, 2 http://www.efaber.net/ 48920 Portugalete, Bizkaia (SPAIN) |
From: Grzegorz A. H. <gr...@ef...> - 2003-02-10 09:19:12
|
On Fri, Jan 31, 2003 at 12:35:56PM +0100, Grzegorz Adam Hankiewicz wrote: > This patch fixes the incorrect relative url construction with web > pages like http://slashdot.org/. Commited. -- Grzegorz Adam Hankiewicz, gr...@ef.... Tel: +34-94-472 35 89. eFaber SL, Maria Diaz de Haro, 68, 2 http://www.efaber.net/ 48920 Portugalete, Bizkaia (SPAIN) |
From: Grzegorz A. H. <gr...@ef...> - 2003-02-12 18:26:38
|
On Wed, Jan 29, 2003 at 06:17:40PM +0100, Grzegorz Adam Hankiewicz wrote: > Hi. > > Looks like the absence of a selftest for "link follow ~c" has let > this bug survive, the search is done in the full html instead of > just the content. > [...] > --- do_request for http://slashdot.org/ > http status: >>200<< > <A HREF="//slashdot.org/search.pl?topic=126"><IMG > SRC="//images.slashdot.org/topics/topictech2.gif" > WIDTH="60" HEIGHT="80" > BORDER="0" ALT="Technology"></A> > --- do_request for http://slashdot.org//slashdot.org/search.pl?topic=126 > [...] > link find ~c "topics" > log [link get_full] > [...] Oh, this was a wrong analysis on my part. tclwebtest correctly searches for the content of all available links. The problem is that when I wrote that script, searching for the "topics" link, and the found hyperlink contained the word "topic", I thought the search was being done on the raw html of the hyperlink. If you try with your text editor to search the word "topics" in that log, you will find out that the search was done correctly, and it was the html img code containing the word topics which triggered the false hit. Since I expect "link find" and related to search in plain text, the solution is quite simple: stripping the html when retrieving the links. Index: lib/tclwebtest.tcl =================================================================== RCS file: /cvsroot/tclwebtest/tclwebtest/lib/tclwebtest.tcl,v retrieving revision 1.20 diff -u -r1.20 tclwebtest.tcl --- lib/tclwebtest.tcl 12 Feb 2003 16:59:57 -0000 1.20 +++ lib/tclwebtest.tcl 12 Feb 2003 18:14:16 -0000 @@ -2146,7 +2146,7 @@ # this is way too simple regexp -nocase {>(.*)<} $a_link(full) match a_link(content) - set a_link(content) [normalize_html $a_link(content)] + set a_link(content) [util_remove_html_tags [normalize_html $a_link(content)]] lappend ::tclwebtest::links [array get a_link] This patch makes my script work as I expected, but maybe this is not the exact behaviour tclwebtest should have. What do you think about it? After all, if somebody is searching for exact html code, the use of 'link find ~f xxx' should be good enough. -- Grzegorz Adam Hankiewicz, gr...@ef.... Tel: +34-94-472 35 89. eFaber SL, Maria Diaz de Haro, 68, 2 http://www.efaber.net/ 48920 Portugalete, Bizkaia (SPAIN) |
From: Tilmann S. <ti...@ti...> - 2003-02-15 10:45:58
|
* Grzegorz Adam Hankiewicz <gr...@ef...> [20030212 18:27]: > If you try with your text editor to search the word "topics" in > that log, you will find out that the search was done correctly, > and it was the html img code containing the word topics which > triggered the false hit. Since I expect "link find" and related > to search in plain text, the solution is quite simple: stripping > the html when retrieving the links. I also think it should search in the text only, stripped from html. Please commit that fix as well, thanks! til |
From: Grzegorz A. H. <gr...@ef...> - 2003-02-15 10:53:42
|
> > Since I expect "link find" and related to search in plain text, > > the solution is quite simple: stripping the html when retrieving > > the links. > > I also think it should search in the text only, stripped from > html. Please commit that fix as well, thanks! Done. -- Grzegorz Adam Hankiewicz, gr...@ef.... Tel: +34-94-472 35 89. eFaber SL, Maria Diaz de Haro, 68, 2 http://www.efaber.net/ 48920 Portugalete, Bizkaia (SPAIN) |