[Htmlparser-developer] Integration release 1.3-20030202 is out
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-02-03 07:21:35
|
Hi Folks, Integration release 1.3-20030202 is out. From the change log : Integration build 1.3 - 20030202 -------------------------------- [1] Renamed HTMLCompositeTagScanner to CompositeTagScanner [2] Renamed HTMLTag.getParameter() to HTMLTag.getAttribute() [3] Added TableScanner [4] Added HtmlPage [5] Added SpanScanner [6] Added assertType in HTMLParserTestCase [7] Added TextExtractingVisitor [8] Added non-recursive visiting (flag in HTMLVisitor) [9] Added DivScanner [10] Modified collectInto to use NodeList [11] Added collectInto(NodeList, Class) [12] CompositeTagScanner can handle single xml-like tags e.g. <div/> [13] Fixed bug 678969 - StringParser was not going into ignore mode on encountering double quotes [14] Added LabelScanner Dhaval Udani has contributed LabelScanner. (He has also contributed a BodyScanner which will make it next week's release). We've shipped this time with two tests failing- both tests replicate the same bug - 677874 - "mishandling of double quotes". I made this release for two reasons : [1] This bug is not a new addition but was always there - its a deep bug in AttributeParser (previously known as ParameterParser) - and it might take a little time to fix [2] There are lot of new additions which we'd like to get out there - we finally have a table scanner! [3] Important bug fixes have been made which further stabilize the parser's performance (and at least one user was desperately waiting for the fix) Notable addition - HTMLNode.collectInto() has a new mode of operation - using the class type. Suppose you need to get to a node (e.g. images) that is within a composite (like a table), you can do : NodeList imageList = new ImageList(); tableTag.collectInto(imageList,HTMLImageTag.class); You can also do this directly from the parser - like so : HTMLNode node [] = parser.extractAllNodesThatAre(HTMLLinkTag.class); And here's some more news - we now have our own wiki (finally!). Go to http://htmlparser.sourceforge.net/docs/ This is a free-for-all wiki. It is a little too much for me to write the entire documentation on my own - so I'd highly appreciate if the user/developer community pitches in - that would be a great benefit for the community. The current documentation on the site is already obsolete, and I am going to take it down soon (hopefully by the next release). Regards, Somik |