From: Mayrgundter, P. <pma...@do...> - 2004-07-26 16:08:17
|
In org/w3c/tidy/PPrint.java Rename the existing printAttrs method to "printAttrsReal", and change its body to refer to this new name in the recursive call: if (attr.next !=3D null) { // Was printAttrs(...) printAttrsReal(fout, indent, node, attr.next); } And then add this method: private void printAttrs(Out fout, int indent, Node node, AttVal attr) { if (this.configuration.dropDuplicateAttributes) { final java.util.Map attrMap =3D new java.util.HashMap(); while (attr !=3D null) { attrMap.put(attr.attribute.toLowerCase(), attr); attr =3D attr.next; } final java.util.Iterator attrItr =3D = attrMap.values().iterator(); AttVal last =3D null; while (attrItr.hasNext()) { attr =3D (AttVal) attrItr.next(); attr.next =3D null; if (last !=3D null) attr.next =3D last; last =3D attr; } } printAttrsReal(fout, indent, node, attr); } This requires also adding the "dropDuplicateAttributes" field to = Configuration. The idea is that if you want duplicate attributes dropped, simply put = each attribute in a hashtable, and then build a new linked list from its = values afterwards. This will ensure that each attribute occurs only once. In general, it would be a bit of a speed improvement to store the = attributes in a hash, as the current scheme seems to be O(n^2): in Node and AttVal, there's a check which does a linear search of all attrs for a match, for = each attr. This change works for a number of test files I'm working on. = Furthermore, this was the only problem I had with converting wild HTML to XHTML using = JTidy. I'm using the latest Xerces SAXParser to check the output for correctness. My configuration settings to get this working are: tidy.setUpperCaseTags(true); tidy.setDocType("omit"); tidy.setXHTML(true); tidy.setNumEntities(true); tidy.setFixComments(true); tidy.setShowWarnings(true); Not sure which of these were strictly needed for conformance. Thanks for starting the project up again. Hope this helps. Cheers, Pablo Mayrgundter |