nekohtml-user Mailing List for CyberNeko HTML Parser (Page 3)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Wow. Thank you so much. So do I understand correctly this is all it takes?

        XMLDocumentFilter idEnhancer = new DefaultFilter() {
            public void startElement(QName element, XMLAttributes attributes, Augmentations augs) throws XNIException {
                int idx = attributes.getIndex("id");
                if (idx > -1) {
                    attributes.setType(idx, "ID");
                    Augmentations attrsAugs = attributes.getAugmentations(idx);
                    attrsAugs.putItem(Constants.ATTRIBUTE_DECLARED, Boolean.TRUE);
                }
                super.startElement(element, attributes, augs);
            }

        XMLDocumentFilter[] filters = { idEnhancer };
        fConfiguration.setProperty("http://cyberneko.org/html/properties/filters", filters);

p.s. I have not decided yet but will probably cheat and use XPath for now. It seems like that will be a solution
that is less likely to change with the times. It definitely seems like something necessary, and I will keep this in
mind for later work. Thank you! Misha

Jacob Kjome wrote:
> That's because it's not a validating parser.  You can only define "id" as being of
> type "ID" if it's validated against a DTD or XML Schema.
> 
> However, there is a workaround [1] that I implemented for the XMLC project [2].
> You can use a NekoHTML Filter [3] to automagically mark certain attributes as
> being of type "ID".  Look for the "idEnhancer" filter in the linked code.  The
> only problem with the solution I came up with is that it uses knowledge about
> Xerces internals that could change at any given release.  That said, it's worked
> since at least Xerces 2.8.1 and the Xerces code that it takes advantage of doesn't
> appear to be up for refactoring anytime soon, IMO.
> 
> What would be really nice it to figure out a less brittle implementation; that is,
> one that doesn't depend upon Xerces internals.  If anyone on this list knows of
> one, it would be a great contribution as getElementById() won't work for HTML
> without it.
> 
> 
> [1]
> http://websvn.ow2.org/filedetails.php?repname=xmlc&path=%2Ftrunk%2Fxmlc%2Fxmlc%2Fmodules%2Fxmlc%2Fsrc%2Forg%2Fenhydra%2Fxml%2Fxmlc%2Fparsers%2Fxerces%2FXercesHTMLDOMParser.java
> [2] http://forge.ow2.org/projects/xmlc/
> [3] http://nekohtml.sourceforge.net/filters.html
> 
> 
> Jake
> 
> On 5/24/2010 9:01 PM, Misha Koshelev wrote:
>> Dear Sirs:
>>
>> Again thank you for such a great product!
>>
>> I am undergoing step (ii) of converting my Web Automation Framework (www.mkosh.com - new version to be posted tomorrow)
>> to using NekoHTML.
>>
>> Thank you so much for your prior help with XPath expressions, etc.
>>
>> Specifically, I have now encountered the following issue.
>>
>> I parse the document and am able to correctly use XPath expression with lowercase element names.
>>
>> The attribute names are also lowercase.
>>
>> However, it seems the id attribute is not marked as being of type "ID", and so document.getElementById always returns null
>> (I checked this by using an XPath that retrieves an Element, getting the "id" attribute, and then immediately doing document.getElementById for that exact attribute).
>>
>> I am using the following code to parse:
>> 		    DOMParser domParser=new DOMParser(new HTMLConfiguration());
>> 		    try {
>> 			domParser.setFeature("http://cyberneko.org/html/features/augmentations",true);
>> 			domParser.setProperty("http://cyberneko.org/html/properties/names/elems","lower");
>> 		    } catch (SAXNotRecognizedException saxnre) {
>> 			throw new WebDriverException("Error parsing document",saxnre);
>> 		    } catch (SAXNotSupportedException saxnse) {
>> 			throw new WebDriverException("Error parsing document",saxnse);
>> 		    }
>> 		    try {
>> 			domParser.parse(new InputSource(new ByteArrayInputStream(pageSource.getBytes())));
>> 		    } catch (IOException ioe) {
>> 			throw new WebDriverException("Error parsing document",ioe);
>> 		    } catch (SAXException saxe) {
>> 			throw new WebDriverException("Error parsing document",saxe);
>> 		    } 
>> 		    setDocument(domParser.getDocument());
>>
>> Thank you so much
>>
>> Misha
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> nekohtml-user mailing list
>> nek...@li...
>> https://lists.sourceforge.net/lists/listinfo/nekohtml-user
>>
>>
>>
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> nekohtml-user mailing list
> nek...@li...
> https://lists.sourceforge.net/lists/listinfo/nekohtml-user

2007	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec (3)
2008	Jan (5)	Feb (13)	Mar (7)	Apr (23)	May (1)	Jun (1)	Jul (10)	Aug (2)	Sep (6)	Oct (6)	Nov	Dec (7)
2009	Jan (4)	Feb (2)	Mar	Apr (6)	May (8)	Jun	Jul (5)	Aug (5)	Sep (2)	Oct (1)	Nov (1)	Dec (1)
2010	Jan (12)	Feb (5)	Mar	Apr (4)	May (22)	Jun (3)	Jul (1)	Aug (3)	Sep (3)	Oct (1)	Nov (1)	Dec (2)
2011	Jan (10)	Feb	Mar (4)	Apr (2)	May	Jun (2)	Jul	Aug (3)	Sep (1)	Oct	Nov	Dec (3)
2012	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep (1)	Oct	Nov	Dec (2)
2013	Jan	Feb	Mar	Apr	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

nekohtml-user Mailing List for CyberNeko HTML Parser (Page 3)

nekohtml-user — User questions, comments, and general discussions