[Htmlparser-developer] Integration release 1.3-20030202 is out

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,
    Integration release 1.3-20030202 is out.

From the change log :

Integration build 1.3 - 20030202
--------------------------------
[1] Renamed HTMLCompositeTagScanner to CompositeTagScanner
[2] Renamed HTMLTag.getParameter() to HTMLTag.getAttribute()
[3] Added TableScanner
[4] Added HtmlPage
[5] Added SpanScanner
[6] Added assertType in HTMLParserTestCase
[7] Added TextExtractingVisitor
[8] Added non-recursive visiting (flag in HTMLVisitor)
[9] Added DivScanner
[10] Modified collectInto to use NodeList
[11] Added collectInto(NodeList, Class)
[12] CompositeTagScanner can handle single xml-like tags e.g. <div/>
[13] Fixed bug 678969 - StringParser was not going into ignore mode on
encountering double quotes
[14] Added LabelScanner

Dhaval Udani has contributed LabelScanner. (He has also contributed a
BodyScanner which will make it next week's release).

We've shipped this time with two tests failing-  both tests replicate the
same bug - 677874 - "mishandling of double quotes". I made this release for
two reasons :
[1] This bug is not a new addition but was always there - its a deep bug in
AttributeParser (previously known as ParameterParser) - and it might take a
little time to fix
[2] There are lot of new additions which we'd like to get out there - we
finally have a table scanner!
[3] Important bug fixes have been made which further stabilize the parser's
performance (and at least one user was desperately waiting for the fix)

Notable addition - HTMLNode.collectInto() has a new mode of operation -
using the class type.
Suppose you need to get to a node (e.g. images) that is within a composite
(like a table), you can do :
NodeList imageList = new ImageList();
tableTag.collectInto(imageList,HTMLImageTag.class);

You can also do this directly from the parser - like so :
HTMLNode node [] = parser.extractAllNodesThatAre(HTMLLinkTag.class);

And here's some more news - we now have our own wiki (finally!). Go to
http://htmlparser.sourceforge.net/docs/
This is a free-for-all wiki. It is a little too much for me to write the
entire documentation on my own - so I'd highly appreciate if the
user/developer community pitches in - that would be a great benefit for the
community. The current documentation on the site is already obsolete, and I
am going to take it down soon (hopefully by the next release).

Regards,
Somik