From: Arshan D. <ars...@gm...> - 2009-01-22 20:51:23
Attachments:
test_neko.html
TestNekoProblems.java
|
diff -u --recursive --exclude='*.xml' --exclude=.classpath nekohtml-1.9.11-original/src/org/cyberneko/html/HTMLConfiguration.java nekohtml-1.9.11/src/org/cyberneko/html/HTMLConfiguration.java --- nekohtml-1.9.11-original/src/org/cyberneko/html/HTMLConfiguration.java 2008-12-08 18:12:50.000000000 -0700 +++ nekohtml-1.9.11/src/org/cyberneko/html/HTMLConfiguration.java 2009-01-21 12:08:21.000000000 -0700 @@ -101,6 +101,11 @@ /** Balance tags. */ protected static final String BALANCE_TAGS = "http://cyberneko.org/html/features/balance-tags"; + + /** Require XML-strict attribute names. **/ + protected static final String ENFORCE_VALID_ATTRIBUTE_NAMES = + "http://cyberneko.org/html/features/enforce-strict-attribute-names"; + // properties /** Modify HTML element names: { "upper", "lower", "default" }. */ @@ -235,6 +240,7 @@ REPORT_ERRORS, SIMPLE_ERROR_FORMAT, BALANCE_TAGS, + ENFORCE_VALID_ATTRIBUTE_NAMES }; addRecognizedFeatures(recognizedFeatures); setFeature(AUGMENTATIONS, false); @@ -243,6 +249,7 @@ setFeature(REPORT_ERRORS, false); setFeature(SIMPLE_ERROR_FORMAT, false); setFeature(BALANCE_TAGS, true); + setFeature(ENFORCE_VALID_ATTRIBUTE_NAMES, false); // HACK: Xerces 2.0.0 if (XERCES_2_0_0) { diff -u --recursive --exclude='*.xml' --exclude=.classpath nekohtml-1.9.11-original/src/org/cyberneko/html/parsers/DOMFragmentParser.java nekohtml-1.9.11/src/org/cyberneko/html/parsers/DOMFragmentParser.java --- nekohtml-1.9.11-original/src/org/cyberneko/html/parsers/DOMFragmentParser.java 2008-03-06 16:02:00.000000000 -0700 +++ nekohtml-1.9.11/src/org/cyberneko/html/parsers/DOMFragmentParser.java 2009-01-21 13:54:35.000000000 -0700 @@ -77,9 +77,13 @@ protected static final String DOCUMENT_FRAGMENT = "http://cyberneko.org/html/features/document-fragment"; + protected static final String ENFORCE_VALID_ATTRIBUTE_NAMES = + "http://cyberneko.org/html/features/enforce-strict-attribute-names"; + /** Recognized features. */ protected static final String[] RECOGNIZED_FEATURES = { DOCUMENT_FRAGMENT, + ENFORCE_VALID_ATTRIBUTE_NAMES }; // properties @@ -130,6 +134,7 @@ fParserConfiguration.addRecognizedFeatures(RECOGNIZED_FEATURES); fParserConfiguration.addRecognizedProperties(RECOGNIZED_PROPERTIES); fParserConfiguration.setFeature(DOCUMENT_FRAGMENT, true); + fParserConfiguration.setFeature(ENFORCE_VALID_ATTRIBUTE_NAMES, false); fParserConfiguration.setDocumentHandler(this); } // <init>() @@ -425,13 +430,47 @@ /** Start element. */ public void startElement(QName element, XMLAttributes attrs, Augmentations augs) throws XNIException { - Element elementNode = fDocument.createElement(element.rawname); - int count = attrs != null ? attrs.getLength() : 0; + + Element elementNode = fDocument.createElement(element.rawname); + + int count = attrs != null ? attrs.getLength() : 0; + for (int i = 0; i < count; i++) { String aname = attrs.getQName(i); String avalue = attrs.getValue(i); - elementNode.setAttribute(aname, avalue); + + String allowedChars = "-._"; + + if ( ! fParserConfiguration.getFeature(ENFORCE_VALID_ATTRIBUTE_NAMES) ) { + + elementNode.setAttribute(aname, avalue); + + } else { + + // only add the attribute if it has a legal name (built of alphanums+allowed chars) + boolean isValidAttributeName = true; + + for (int j = 0;j < aname.length(); j++) { + + char ch = aname.charAt(j); + + if ( j == 0 && ! Character.isLetter(ch) ) { + j = aname.length(); + isValidAttributeName = false; + } else if ( ! (Character.isLetterOrDigit(ch) || allowedChars.indexOf(ch) != -1) ) { + j = aname.length(); + isValidAttributeName = false; + } + } + + if ( isValidAttributeName ) { + elementNode.setAttribute(aname, avalue); + } + + } + } + fCurrentNode.appendChild(elementNode); fCurrentNode = elementNode; } // startElement(QName,XMLAttributes,Augmentations) |
From: Marc G. <mgu...@ya...> - 2009-01-23 08:34:01
|
Arshan, can you please open an issue for that and attach your patch there? Cheers, Marc. -- Web: http://www.efficient-webtesting.com Blog: http://mguillem.wordpress.com Arshan Dabirsiaghi wrote: > Marc/list, > > Marc and I had a conversation last year about this bug > (http://sourceforge.net/tracker/index.php?func=detail&aid=1995218&group_id=195122&atid=952178 > <http://sourceforge.net/tracker/index.php?func=detail&aid=1995218&group_id=195122&atid=952178>). > Basically, if you try to parse the following text in a DocumentFragment > parser, you'll get an error: > > <a - href="/">link</a> > <a . href="/">link</a> > > The error is as follows: > >> Caused by: org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid > or illegal XML character is specified. > > This is because the attributes the XML specification is very strict > about special characters in attribute names. Running this through the > regular DOMParser doesn't because the parsing logic is different. > > Anyway, I have attached a patch that creates a new feature, > "http://cyberneko.org/html/features/enforce-strict-attribute-names", > that does not process XML-illegal attribute names for DocumentFragments. > It is off by default at Marc's request. I hope that everyone benefits > from this patch. > > I have also attached a non-JUnit test case. You will have to make sure > the path of the file name in the DOMParser test points to the HTML file > that I also attached to make it work in your environment. That > particular test is not very important because all it does is show that > the DOMParser does not have the same flaw. > > Does anyone have any input on the approach? If no one has any > improvements I lobby that this be included in the next minor release. > > Thanks, > Arshan > > > ------------------------------------------------------------------------ > > > > > > > test <http://owasp.org> > test2 <http://owasp.org> > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > > > ------------------------------------------------------------------------ > > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user |