[Htmlparser-announce] Integration Release 1.3-20030420 is out
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-04-20 03:04:37
|
Hi Folks, This week's release is out. From the change log: Integration Build 1.3 - 20030420 -------------------------------- [1] Fixed bug #722046 StringExtractor.extractStrings misses most of the text, change to use a StringBean to dig into tables. [2] add checking in Translate to eliminate bug #722835 StringIndexOutOfBoundsException exception [3] added line-break condition in assertXmlEquals [4] added fit testing framework [5] added parent association for each node [6] added digupStringNode() and findPositionOf(Node) to CompositeTag [7] Fixed bug 723835 in LinkExtractor We have some powerful searching capability with this release. From any node, you can find the parent composite tag, and navigate thru the entire html structure. This is useful in scenarios like : Search for data that lies close to a certain piece of text. e.g. ... <table> <tr> <td> <b>Name:</b><i>John Doe</i> </td> </tr> </table> We can extract John Doe, by using our knowledge of its expected position. If we assume that the contents are inside a table tag, here's what a program could look like: parser.registerScanners(); Node nodes [] = parser.extractAllNodesThatAre(TableTag.class); // Lets assume our data is in the second table TableTag table = (TableTag)nodes[1]; // Find the position of Name. StringNode [] stringNodes = table.digupStringNode("Name"); // We assume that the first node that matched is the one we want. We navigate to its parent Node parentOfName = stringNodes[0].getParent(); // From the parent, we shall find out the position of "Name" int posOfName = parentOfName.findPositionOf(stringNodes[0]); // Its easy now to navigate to John Doe, as we know it is 3 positions away Node expectedName = parentOfName.childAt(posOfName + 3); This can be useful for writing tests for your pages or extracting position based info - new possibilities open up for semantic searches. Regards, Somik |