From: Johannes L. <Joh...@un...> - 2010-05-28 02:15:06
|
Hi, I've got two HTML documents (serialized as XML) which have slightly different structures at the end because of a stylesheet template rule based on attributes (and attribute order isn't defined in XML documents I think): ... <h4>CATEGORY I</h4> <table> <tr> <td>DESC</td> <td>Paperback</td> </tr> <tr> <td>CODE</td> <td>P</td> </tr> </table> <hr/> <h4>CATEGORY II</h4> <table> <tr> <td>DESC</td> <td>Mass-market Paperback</td> </tr> <tr> <td>CODE</td> <td>MMP</td> </tr> </table> vs. ... <h4>CATEGORY I</h4> <table> <tr> <td>CODE</td> <td>P</td> </tr> <tr> <td>DESC</td> <td>Paperback</td> </tr> </table> <hr/> <h4>CATEGORY II</h4> <table> <tr> <td>CODE</td> <td>MMP</td> </tr> <tr> <td>DESC</td> <td>Mass-market Paperback</td> </tr> </table> You can see that only the tr-elements are exchanged, but assertXMLEqual fales, even if I change the table structure to unordered lists, so I assume it's not doable (maybe only if 2 nodes are really adjacent (and changed in order) the algorithm finds similar documents). The strange thing is it doesn't find similarities with the following fragment (CODE and DESC exchanged in the other document): <h4>CATEGORY I</h4> <ul> <li>CODE</li> <li>DESC</li> </ul> vs. <h4>CATEGORY I</h4> <ul> <li>DESC</li> <li>CODE</li> </ul> I've simply used assertXMLEqual, maybe I'll try it with Diff and diff.similar(), but it should be the same according to the docu. regards, Johannes |