[Jtidy-devel] Fix for the duplicate attribute bug

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

In org/w3c/tidy/PPrint.java

Rename the existing printAttrs method to "printAttrsReal", and
change its body to refer to this new name in the recursive call:

  if (attr.next !=3D null)
  {
      // Was printAttrs(...)
      printAttrsReal(fout, indent, node, attr.next);
  }

And then add this method:

private void printAttrs(Out fout, int indent, Node node, AttVal attr) {
    if (this.configuration.dropDuplicateAttributes) {
        final java.util.Map attrMap =3D new java.util.HashMap();
        while (attr !=3D null) {
            attrMap.put(attr.attribute.toLowerCase(), attr);
            attr =3D attr.next;
        }
        final java.util.Iterator attrItr =3D =
attrMap.values().iterator();
        AttVal last =3D null;
        while (attrItr.hasNext()) {
            attr =3D (AttVal) attrItr.next();
            attr.next =3D null;
            if (last !=3D null)
                attr.next =3D last;
            last =3D attr;
        }
    }
    printAttrsReal(fout, indent, node, attr);
}

This requires also adding the "dropDuplicateAttributes" field to =
Configuration.

The idea is that if you want duplicate attributes dropped, simply put =
each
attribute in a hashtable, and then build a new linked list from its =
values
afterwards.  This will ensure that each attribute occurs only once.

In general, it would be a bit of a speed improvement to store the =
attributes
in a hash, as the current scheme seems to be O(n^2): in Node and AttVal,
there's a check which does a linear search of all attrs for a match, for =
each
attr.

This change works for a number of test files I'm working on.  =
Furthermore, this
was the only problem I had with converting wild HTML to XHTML using =
JTidy. I'm
using the latest Xerces SAXParser to check the output for correctness.

My configuration settings to get this working are:

  tidy.setUpperCaseTags(true);
  tidy.setDocType("omit");
  tidy.setXHTML(true);
  tidy.setNumEntities(true);
  tidy.setFixComments(true);
  tidy.setShowWarnings(true);

Not sure which of these were strictly needed for conformance.

Thanks for starting the project up again.  Hope this helps.

Cheers,
Pablo Mayrgundter