Thread: [Htmlparser-developer] CompositeTagScanner - Some comments

Brought to you by: derrickoswald

htmlparser-developer

[Htmlparser-developer] CompositeTagScanner - Some comments

From: <dha...@or...> - 2003-05-09 04:34:51

Hi,

A lot of thought has definitely gone into the design of the 
CompositeTagScanner. Some absolutely wonderful work has been done here. Somik, 
had asked me to have a look at the code and review it. I just have one point 
for discussion.

The CompositeTagScanner has a provision to allow for nested children. However I 
feel there are very few HTML tags which have children of the same type. By 
default the scanner allows nesting. I believe this behaviour should be 
disallowed by default.

my $0.02 ;)

dhaval

Re: [Htmlparser-developer] CompositeTagScanner - Some comments

From: Derrick O. <Der...@ro...> - 2003-05-09 22:53:20

Changing the two 'true' default constructor values to 'false' only 
breaks one test case, testDoubleTitleTag.

The node count changes from 7 to 10.
Correct me if I'm wrong, but with only the title scanner registered,

<html><head><TITLE>
<html><head><TITLE>
Double tags can hang the code
</TITLE></head><body>
<body><html>

should yield 8 tags:

<html>
<head>
<TITLE> containing <html><head> and a generated </TITLE>
<TITLE> containing "Double tags can hang the code"</TITLE>
</head>
<body>
<body>
<html>

which isn't either of those answers, the original or the new one.

In the test case, the first TITLE tag is correct, but the second 
contains the string and /TITLE but doesn't consume them, they are 
returned separately:

<html>
<head>
<TITLE> containing <html><head> and a generated </TITLE>
<TITLE> containing "Double tags can hang the code"</TITLE>
"Double tags can hang the code
</TITLE>
</head>
<body>
<body>
<html>

Does it still need work?

Derrick

dha...@or... wrote:

>Hi,
>
>A lot of thought has definitely gone into the design of the 
>CompositeTagScanner. Some absolutely wonderful work has been done here. Somik, 
>had asked me to have a look at the code and review it. I just have one point 
>for discussion.
>
>The CompositeTagScanner has a provision to allow for nested children. However I 
>feel there are very few HTML tags which have children of the same type. By 
>default the scanner allows nesting. I believe this behaviour should be 
>disallowed by default.
>
>my $0.02 ;)
>
>dhaval
>
>
>
>
>
>
>-------------------------------------------------------
>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
>The only event dedicated to issues related to Linux enterprise solutions
>www.enterpriselinuxforum.com
>
>_______________________________________________
>Htmlparser-developer mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>  
>

Re: [Htmlparser-developer] CompositeTagScanner - Some comments

From: Somik R. <so...@ya...> - 2003-05-10 15:52:55

Derrick Oswald wrote:

> Changing the two 'true' default constructor values to 'false' only
> breaks one test case, testDoubleTitleTag.

Hmm, I can see a couple of failing tests, so didnt have the confidence to
play around with the code. In any case, your expectation is correct - it
will be good if you can produce a testcase in CompositeTagScannerTest
class - using the CustomTag/AnotherTag classes..

I've been swamped all week (and out of town on an assignment). This will
continue for 3 months, so my contributions will be sporadic.

Regards,
Somik