You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(5) |
Feb
(13) |
Mar
(7) |
Apr
(23) |
May
(1) |
Jun
(1) |
Jul
(10) |
Aug
(2) |
Sep
(6) |
Oct
(6) |
Nov
|
Dec
(7) |
2009 |
Jan
(4) |
Feb
(2) |
Mar
|
Apr
(6) |
May
(8) |
Jun
|
Jul
(5) |
Aug
(5) |
Sep
(2) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2010 |
Jan
(12) |
Feb
(5) |
Mar
|
Apr
(4) |
May
(22) |
Jun
(3) |
Jul
(1) |
Aug
(3) |
Sep
(3) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2011 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(2) |
May
|
Jun
(2) |
Jul
|
Aug
(3) |
Sep
(1) |
Oct
|
Nov
|
Dec
(3) |
2012 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(2) |
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Marc G. <mgu...@ya...> - 2010-02-02 16:10:38
|
Jacob Kjome wrote: > Thanks for getting the release out, Marc. A couple things, though... > > FYI. The "status" is still "Open" for bug 2942363 {1], even though the > resolution is "Fixed". I guess that reporter changed it > Also, why is everything a branch and not a tag? Usually the SVN structure > is.... > ... this was the setup decided by Andy. I agree with you that tags would be better but personally I don't really care. Cheers, Marc. -- Blog: http://mguillem.wordpress.com |
From: Jacob K. <ho...@vi...> - 2010-02-02 15:47:41
|
Thanks for getting the release out, Marc. A couple things, though... FYI. The "status" is still "Open" for bug 2942363 {1], even though the resolution is "Fixed". Also, why is everything a branch and not a tag? Usually the SVN structure is.... branches/ tags/ trunk/ ...and branches are usually not specific versions of code, like "1.9.14", but something like "1.9.xx" where the branch would be used to maintain the 1.9 major.minor collection of code (assuming the trunk didn't represent this already, which it currently does). Tags would be used to define specific versions like "1.9.14", which represent a static, non-changing, view of the code from a particular point in time. But NekoHTML seems to have only... branches/ trunk/ To me, this implies that each of NetkoHTML's versions are non-static and may change over time, since that's what branches are for. I guess it doesn't matter that much as long as it is understood by developers that branches are to be treated like tags (static, non-changing), but why the unorthodox usage? I'm curious as to why it's set up that way? Is this just how SourceForge's infrastructure team set it up or did someone on the NekoHTML team make a conscious decision to do it this way? [1] http://sourceforge.net/tracker/?func=detail&aid=2942363&group_id=195122&atid=952178 Jake On Tue, 02 Feb 2010 15:50:22 +0100 Marc Guillemot <mgu...@ya...> wrote: > Hi all, > > release 1.9.14 of NekoHTML is now available. > > http://nekohtml.sourceforge.net > > This release contains different improvements and bug fixes. Description > of the changes is available at > http://nekohtml.sourceforge.net/changes.html > > The maven bundle has been uploaded to NekoHTML repository and should > become available in the main repository within a few hours. > > Enjoy! > > Marc. > -- > Blog: http://mguillem.wordpress.com > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the >business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > |
From: Marc G. <mgu...@ya...> - 2010-02-02 14:50:28
|
Hi all, release 1.9.14 of NekoHTML is now available. http://nekohtml.sourceforge.net This release contains different improvements and bug fixes. Description of the changes is available at http://nekohtml.sourceforge.net/changes.html The maven bundle has been uploaded to NekoHTML repository and should become available in the main repository within a few hours. Enjoy! Marc. -- Blog: http://mguillem.wordpress.com |
From: Thierry L. <tl...@gm...> - 2010-01-22 16:17:45
|
After a couple of first test it seems to work nicely on my android device :) Thanks again. Thierry. On Fri, Jan 22, 2010 at 4:37 PM, Marc Guillemot <mgu...@ya...> wrote: > Hi, > > I've committed the fix to generate a correct version of > xercesMinimal.jar. The unit tests now run with this version too. > > If you're only interested in this xercesMinimal.jar: > > http://nekohtml.svn.sourceforge.net/viewvc/nekohtml/trunk/lib/xerces-minimal/xercesMinimal.jar?revision=279 > > Cheers, > Marc. > > > Thierry Legras a écrit : > > Hi, > > > > Any news on that? > > If not, i might give a try. How do you compile xerces to generate > > xercesMinimal.jar? i could not see any info on that on neko source > archive. > > > > Thierry. > > > > > > On Mon, Jan 11, 2010 at 1:52 PM, Marc Guillemot <mgu...@ya... > > <mailto:mgu...@ya...>> wrote: > > > > Hi Stewart, > > > > you're right: more root classes have to be taken to generate the > > xercesMinimal.jar. This will be fixed in next release. > > > > Cheers, > > Marc. > > > > Stewart Cambridge a écrit : > > > I tried creating a new xercesMinimal.jar, thinking it was not > > based on > > > 2.9.1, but I got the same result. > > > > > > Inside the Neko code, in XercesBridge I think, there is some > factory > > > code which detects which version of Xerces you are using. Some of > > this > > > detection is done but attempting to call methods on certain > classes. > > > If those methods don't exist, it catches the exception and says > "not > > > that version of xerces". > > > > > > Maybe the xercesMinimal code does not have the class it's looking > for > > > by virtue of being minimal, and then it defaults back to an early > > > version of Xerces, which is the wrong one. > > > > > > I think the solution is to either override the XercesBridge class > (I > > > may have the name slightly wrong) or to include the specific > "version > > > detection" classes that XercesBridge is looking for. > > > > > > Stewart > > > > > > > > > 2010/1/8 Thierry Legras <tl...@gm... > > <mailto:tl...@gm...>>: > > >> Yes, i tried in a pure java project (non android) and had the > > same issue. i > > >> think the minimal xerces jar is broken. does anybody could use it > > >> sucessfully? > > >> > > >> And yes i would avoid to use the full xerces lib as the apk size > > is more > > >> than 600kByte for a tiny application :( > > >> > > >> Thierry. > > >> http://sites.google.com/site/tlegras > > >> > > >> On Thu, Jan 7, 2010 at 2:18 AM, Stewart Cambridge > > >> <ste...@gm... > > <mailto:ste...@gm...>> wrote: > > >>> I get this problem too when I switchbetween xercesMinimal and > > >>> xerces-2.9.1 - it's not a problem particular to android. > > >>> > > >>> But I guess mobile apps need to think about resources more > > carefully > > >>> than other apps. > > >>> > > >>> Stewart > > >>> > > >>> > > >>>> Hi, > > >>>> > > >>>> I am trying to use nekohtml to parse a html from an android > > device. For > > >>>> that > > >>>> i would like to use the minimal xerces jar and started from > > the code in > > >>>> sample Minimal.java class, but this does not seem to work. I > > tried again > > >>>> in > > >>>> a pure java project and had the same problem in parse function > > call. here > > >>>> is > > >>>> the backtrace: > > >>>> > > >>>> Exception in thread "main" java.lang.NoSuchMethodError: > > >>>> > > >>>>> > > > org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V > > >>>> at > > >>>> > > >>>>> > > > org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) > > >>>> at > > >>>> > > > org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) > > >>>> at > > org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) > > >>>> at > > >>>> > > > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) > > >>>> at > > >>>> > > > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) > > >>>> at main.main(main.java:72) > > >>>> > > >>>> However in eclipse if i remove external lib xercesMinimal.jar > to > > >>>> xercesImpl.jar from xercer 2.9.1, it seems to work like a > charm. > > >>>> Is there anything i missed to make xercesMinimal.jar works? > > >>>> > > >>>> Another question: i just need to access a few content in some > > specific > > >>>> tag > > >>>> in an html page, so all the validity check as no interest for > > me. what > > >>>> feature/property should i turn off: > > >>>> probably balance-tags? more? > > >>>> > > >>>> -- > > >>>> Thierry. > > >>>> http://sites.google.com/site/tlegras > > >> > > >> > > >> -- > > >> Thierry. > > >> > > > > > > > > > ------------------------------------------------------------------------------ > > > This SF.Net email is sponsored by the Verizon Developer Community > > > Take advantage of Verizon's best-in-class app development support > > > A streamlined, 14 day to market process makes app distribution > > fast and easy > > > Join now and get one step closer to millions of Verizon customers > > > http://p.sf.net/sfu/verizon-dev2dev > > > _______________________________________________ > > > nekohtml-user mailing list > > > nek...@li... > > <mailto:nek...@li...> > > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > > > > > > ------------------------------------------------------------------------------ > > This SF.Net email is sponsored by the Verizon Developer Community > > Take advantage of Verizon's best-in-class app development support > > A streamlined, 14 day to market process makes app distribution fast > > and easy > > Join now and get one step closer to millions of Verizon customers > > http://p.sf.net/sfu/verizon-dev2dev > > _______________________________________________ > > nekohtml-user mailing list > > nek...@li... > > <mailto:nek...@li...> > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > > > > > > -- > > Thierry. > > > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------------ > > Throughout its 18-year history, RSA Conference consistently attracts the > > world's best and brightest in the field, creating opportunities for > Conference > > attendees to learn about information security's most important issues > through > > interactions with peers, luminaries and emerging and established > companies. > > http://p.sf.net/sfu/rsaconf-dev2dev > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > nekohtml-user mailing list > > nek...@li... > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important issues > through > interactions with peers, luminaries and emerging and established companies. > http://p.sf.net/sfu/rsaconf-dev2dev > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > -- Thierry. |
From: Thierry L. <tl...@gm...> - 2010-01-22 15:44:56
|
oh yes i will try :) Many merci Marc. On Fri, Jan 22, 2010 at 4:37 PM, Marc Guillemot <mgu...@ya...> wrote: > Hi, > > I've committed the fix to generate a correct version of > xercesMinimal.jar. The unit tests now run with this version too. > > If you're only interested in this xercesMinimal.jar: > > http://nekohtml.svn.sourceforge.net/viewvc/nekohtml/trunk/lib/xerces-minimal/xercesMinimal.jar?revision=279 > > Cheers, > Marc. > > > Thierry Legras a écrit : > > Hi, > > > > Any news on that? > > If not, i might give a try. How do you compile xerces to generate > > xercesMinimal.jar? i could not see any info on that on neko source > archive. > > > > Thierry. > > > > > > On Mon, Jan 11, 2010 at 1:52 PM, Marc Guillemot <mgu...@ya... > > <mailto:mgu...@ya...>> wrote: > > > > Hi Stewart, > > > > you're right: more root classes have to be taken to generate the > > xercesMinimal.jar. This will be fixed in next release. > > > > Cheers, > > Marc. > > > > Stewart Cambridge a écrit : > > > I tried creating a new xercesMinimal.jar, thinking it was not > > based on > > > 2.9.1, but I got the same result. > > > > > > Inside the Neko code, in XercesBridge I think, there is some > factory > > > code which detects which version of Xerces you are using. Some of > > this > > > detection is done but attempting to call methods on certain > classes. > > > If those methods don't exist, it catches the exception and says > "not > > > that version of xerces". > > > > > > Maybe the xercesMinimal code does not have the class it's looking > for > > > by virtue of being minimal, and then it defaults back to an early > > > version of Xerces, which is the wrong one. > > > > > > I think the solution is to either override the XercesBridge class > (I > > > may have the name slightly wrong) or to include the specific > "version > > > detection" classes that XercesBridge is looking for. > > > > > > Stewart > > > > > > > > > 2010/1/8 Thierry Legras <tl...@gm... > > <mailto:tl...@gm...>>: > > >> Yes, i tried in a pure java project (non android) and had the > > same issue. i > > >> think the minimal xerces jar is broken. does anybody could use it > > >> sucessfully? > > >> > > >> And yes i would avoid to use the full xerces lib as the apk size > > is more > > >> than 600kByte for a tiny application :( > > >> > > >> Thierry. > > >> http://sites.google.com/site/tlegras > > >> > > >> On Thu, Jan 7, 2010 at 2:18 AM, Stewart Cambridge > > >> <ste...@gm... > > <mailto:ste...@gm...>> wrote: > > >>> I get this problem too when I switchbetween xercesMinimal and > > >>> xerces-2.9.1 - it's not a problem particular to android. > > >>> > > >>> But I guess mobile apps need to think about resources more > > carefully > > >>> than other apps. > > >>> > > >>> Stewart > > >>> > > >>> > > >>>> Hi, > > >>>> > > >>>> I am trying to use nekohtml to parse a html from an android > > device. For > > >>>> that > > >>>> i would like to use the minimal xerces jar and started from > > the code in > > >>>> sample Minimal.java class, but this does not seem to work. I > > tried again > > >>>> in > > >>>> a pure java project and had the same problem in parse function > > call. here > > >>>> is > > >>>> the backtrace: > > >>>> > > >>>> Exception in thread "main" java.lang.NoSuchMethodError: > > >>>> > > >>>>> > > > org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V > > >>>> at > > >>>> > > >>>>> > > > org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) > > >>>> at > > >>>> > > > org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) > > >>>> at > > org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) > > >>>> at > > >>>> > > > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) > > >>>> at > > >>>> > > > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) > > >>>> at main.main(main.java:72) > > >>>> > > >>>> However in eclipse if i remove external lib xercesMinimal.jar > to > > >>>> xercesImpl.jar from xercer 2.9.1, it seems to work like a > charm. > > >>>> Is there anything i missed to make xercesMinimal.jar works? > > >>>> > > >>>> Another question: i just need to access a few content in some > > specific > > >>>> tag > > >>>> in an html page, so all the validity check as no interest for > > me. what > > >>>> feature/property should i turn off: > > >>>> probably balance-tags? more? > > >>>> > > >>>> -- > > >>>> Thierry. > > >>>> http://sites.google.com/site/tlegras > > >> > > >> > > >> -- > > >> Thierry. > > >> > > > > > > > > > ------------------------------------------------------------------------------ > > > This SF.Net email is sponsored by the Verizon Developer Community > > > Take advantage of Verizon's best-in-class app development support > > > A streamlined, 14 day to market process makes app distribution > > fast and easy > > > Join now and get one step closer to millions of Verizon customers > > > http://p.sf.net/sfu/verizon-dev2dev > > > _______________________________________________ > > > nekohtml-user mailing list > > > nek...@li... > > <mailto:nek...@li...> > > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > > > > > > ------------------------------------------------------------------------------ > > This SF.Net email is sponsored by the Verizon Developer Community > > Take advantage of Verizon's best-in-class app development support > > A streamlined, 14 day to market process makes app distribution fast > > and easy > > Join now and get one step closer to millions of Verizon customers > > http://p.sf.net/sfu/verizon-dev2dev > > _______________________________________________ > > nekohtml-user mailing list > > nek...@li... > > <mailto:nek...@li...> > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > > > > > > -- > > Thierry. > > > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------------ > > Throughout its 18-year history, RSA Conference consistently attracts the > > world's best and brightest in the field, creating opportunities for > Conference > > attendees to learn about information security's most important issues > through > > interactions with peers, luminaries and emerging and established > companies. > > http://p.sf.net/sfu/rsaconf-dev2dev > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > nekohtml-user mailing list > > nek...@li... > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important issues > through > interactions with peers, luminaries and emerging and established companies. > http://p.sf.net/sfu/rsaconf-dev2dev > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > -- Thierry. |
From: Marc G. <mgu...@ya...> - 2010-01-22 15:38:09
|
Hi, I've committed the fix to generate a correct version of xercesMinimal.jar. The unit tests now run with this version too. If you're only interested in this xercesMinimal.jar: http://nekohtml.svn.sourceforge.net/viewvc/nekohtml/trunk/lib/xerces-minimal/xercesMinimal.jar?revision=279 Cheers, Marc. Thierry Legras a écrit : > Hi, > > Any news on that? > If not, i might give a try. How do you compile xerces to generate > xercesMinimal.jar? i could not see any info on that on neko source archive. > > Thierry. > > > On Mon, Jan 11, 2010 at 1:52 PM, Marc Guillemot <mgu...@ya... > <mailto:mgu...@ya...>> wrote: > > Hi Stewart, > > you're right: more root classes have to be taken to generate the > xercesMinimal.jar. This will be fixed in next release. > > Cheers, > Marc. > > Stewart Cambridge a écrit : > > I tried creating a new xercesMinimal.jar, thinking it was not > based on > > 2.9.1, but I got the same result. > > > > Inside the Neko code, in XercesBridge I think, there is some factory > > code which detects which version of Xerces you are using. Some of > this > > detection is done but attempting to call methods on certain classes. > > If those methods don't exist, it catches the exception and says "not > > that version of xerces". > > > > Maybe the xercesMinimal code does not have the class it's looking for > > by virtue of being minimal, and then it defaults back to an early > > version of Xerces, which is the wrong one. > > > > I think the solution is to either override the XercesBridge class (I > > may have the name slightly wrong) or to include the specific "version > > detection" classes that XercesBridge is looking for. > > > > Stewart > > > > > > 2010/1/8 Thierry Legras <tl...@gm... > <mailto:tl...@gm...>>: > >> Yes, i tried in a pure java project (non android) and had the > same issue. i > >> think the minimal xerces jar is broken. does anybody could use it > >> sucessfully? > >> > >> And yes i would avoid to use the full xerces lib as the apk size > is more > >> than 600kByte for a tiny application :( > >> > >> Thierry. > >> http://sites.google.com/site/tlegras > >> > >> On Thu, Jan 7, 2010 at 2:18 AM, Stewart Cambridge > >> <ste...@gm... > <mailto:ste...@gm...>> wrote: > >>> I get this problem too when I switchbetween xercesMinimal and > >>> xerces-2.9.1 - it's not a problem particular to android. > >>> > >>> But I guess mobile apps need to think about resources more > carefully > >>> than other apps. > >>> > >>> Stewart > >>> > >>> > >>>> Hi, > >>>> > >>>> I am trying to use nekohtml to parse a html from an android > device. For > >>>> that > >>>> i would like to use the minimal xerces jar and started from > the code in > >>>> sample Minimal.java class, but this does not seem to work. I > tried again > >>>> in > >>>> a pure java project and had the same problem in parse function > call. here > >>>> is > >>>> the backtrace: > >>>> > >>>> Exception in thread "main" java.lang.NoSuchMethodError: > >>>> > >>>>> > org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V > >>>> at > >>>> > >>>>> > org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) > >>>> at > >>>> > org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) > >>>> at > org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) > >>>> at > >>>> > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) > >>>> at > >>>> > org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) > >>>> at main.main(main.java:72) > >>>> > >>>> However in eclipse if i remove external lib xercesMinimal.jar to > >>>> xercesImpl.jar from xercer 2.9.1, it seems to work like a charm. > >>>> Is there anything i missed to make xercesMinimal.jar works? > >>>> > >>>> Another question: i just need to access a few content in some > specific > >>>> tag > >>>> in an html page, so all the validity check as no interest for > me. what > >>>> feature/property should i turn off: > >>>> probably balance-tags? more? > >>>> > >>>> -- > >>>> Thierry. > >>>> http://sites.google.com/site/tlegras > >> > >> > >> -- > >> Thierry. > >> > > > > > ------------------------------------------------------------------------------ > > This SF.Net email is sponsored by the Verizon Developer Community > > Take advantage of Verizon's best-in-class app development support > > A streamlined, 14 day to market process makes app distribution > fast and easy > > Join now and get one step closer to millions of Verizon customers > > http://p.sf.net/sfu/verizon-dev2dev > > _______________________________________________ > > nekohtml-user mailing list > > nek...@li... > <mailto:nek...@li...> > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast > and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > nekohtml-user mailing list > nek...@li... > <mailto:nek...@li...> > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > -- > Thierry. > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for Conference > attendees to learn about information security's most important issues through > interactions with peers, luminaries and emerging and established companies. > http://p.sf.net/sfu/rsaconf-dev2dev > > > ------------------------------------------------------------------------ > > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user |
From: Thierry L. <tl...@gm...> - 2010-01-22 15:17:58
|
Hi, Any news on that? If not, i might give a try. How do you compile xerces to generate xercesMinimal.jar? i could not see any info on that on neko source archive. Thierry. On Mon, Jan 11, 2010 at 1:52 PM, Marc Guillemot <mgu...@ya...> wrote: > Hi Stewart, > > you're right: more root classes have to be taken to generate the > xercesMinimal.jar. This will be fixed in next release. > > Cheers, > Marc. > > Stewart Cambridge a écrit : > > I tried creating a new xercesMinimal.jar, thinking it was not based on > > 2.9.1, but I got the same result. > > > > Inside the Neko code, in XercesBridge I think, there is some factory > > code which detects which version of Xerces you are using. Some of this > > detection is done but attempting to call methods on certain classes. > > If those methods don't exist, it catches the exception and says "not > > that version of xerces". > > > > Maybe the xercesMinimal code does not have the class it's looking for > > by virtue of being minimal, and then it defaults back to an early > > version of Xerces, which is the wrong one. > > > > I think the solution is to either override the XercesBridge class (I > > may have the name slightly wrong) or to include the specific "version > > detection" classes that XercesBridge is looking for. > > > > Stewart > > > > > > 2010/1/8 Thierry Legras <tl...@gm...>: > >> Yes, i tried in a pure java project (non android) and had the same > issue. i > >> think the minimal xerces jar is broken. does anybody could use it > >> sucessfully? > >> > >> And yes i would avoid to use the full xerces lib as the apk size is more > >> than 600kByte for a tiny application :( > >> > >> Thierry. > >> http://sites.google.com/site/tlegras > >> > >> On Thu, Jan 7, 2010 at 2:18 AM, Stewart Cambridge > >> <ste...@gm...> wrote: > >>> I get this problem too when I switchbetween xercesMinimal and > >>> xerces-2.9.1 - it's not a problem particular to android. > >>> > >>> But I guess mobile apps need to think about resources more carefully > >>> than other apps. > >>> > >>> Stewart > >>> > >>> > >>>> Hi, > >>>> > >>>> I am trying to use nekohtml to parse a html from an android device. > For > >>>> that > >>>> i would like to use the minimal xerces jar and started from the code > in > >>>> sample Minimal.java class, but this does not seem to work. I tried > again > >>>> in > >>>> a pure java project and had the same problem in parse function call. > here > >>>> is > >>>> the backtrace: > >>>> > >>>> Exception in thread "main" java.lang.NoSuchMethodError: > >>>> > >>>>> > org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V > >>>> at > >>>> > >>>>> > org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) > >>>> at > >>>> > org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) > >>>> at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) > >>>> at > >>>> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) > >>>> at > >>>> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) > >>>> at main.main(main.java:72) > >>>> > >>>> However in eclipse if i remove external lib xercesMinimal.jar to > >>>> xercesImpl.jar from xercer 2.9.1, it seems to work like a charm. > >>>> Is there anything i missed to make xercesMinimal.jar works? > >>>> > >>>> Another question: i just need to access a few content in some specific > >>>> tag > >>>> in an html page, so all the validity check as no interest for me. what > >>>> feature/property should i turn off: > >>>> probably balance-tags? more? > >>>> > >>>> -- > >>>> Thierry. > >>>> http://sites.google.com/site/tlegras > >> > >> > >> -- > >> Thierry. > >> > > > > > ------------------------------------------------------------------------------ > > This SF.Net email is sponsored by the Verizon Developer Community > > Take advantage of Verizon's best-in-class app development support > > A streamlined, 14 day to market process makes app distribution fast and > easy > > Join now and get one step closer to millions of Verizon customers > > http://p.sf.net/sfu/verizon-dev2dev > > _______________________________________________ > > nekohtml-user mailing list > > nek...@li... > > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > > > > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > -- Thierry. |
From: Luis F. G. <lui...@ya...> - 2010-01-13 15:32:21
|
Hi Stewart. A couple of month ago I was needing that same functionality to apply certain XLST dinamically. At that time I tried to study the code to figure out were I should apply changes to solve my problem, but didn`t finish anything. If you want I would be interested in helping you write now. By the way, I think I've seen one or two emails in the list requesting something similar. Luis Fernando ----- Original Message ---- From: Stewart Cambridge <ste...@gm...> To: nek...@li... Sent: Wed, January 6, 2010 8:22:08 PM Subject: [nekohtml-user] Serializing empty (self-closing) tags Hi Neko Users, Does anyone know if there are any plans for a feature setting for whether empty (self-closing) tags should contain a '/' or not? In other words, I want my serialized HTML to contain <br/> rather than <br>. At the moment I'm working around by overriding: public void emptyElement(QName element, XMLAttributes attributes, Augmentations augs) { element.rawname = element.rawname + "/"; super.emptyElement(element, attributes, augs); } which seems a little hackish. It certainly doesn't solve <img src=""/> or <input type=""/> Anyone else need this? Regards, Stewart ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ nekohtml-user mailing list nek...@li... https://lists.sourceforge.net/lists/listinfo/nekohtml-user |
From: Wojtek P. <woj...@ho...> - 2010-01-13 05:38:35
|
Hi, I'm new to neko-html. I'd like to be able to conditionally accept an element's value, depending on its length. My requirement is to only accept <a> content less than 20 characters long. I couldn't figure out how to do this by just extending the ElementRemover filter. After digging in to internal variable values, it seems like I might need to customize the HTML scanner. Thanks, Wojtek _________________________________________________________________ Reinvent how you stay in touch with the new Windows Live Messenger. http://go.microsoft.com/?linkid=9706116 |
From: Thierry L. <tl...@gm...> - 2010-01-11 13:52:38
|
This is a very good news. Thanks marc. On Mon, Jan 11, 2010 at 1:52 PM, Marc Guillemot <mgu...@ya...> wrote: > Hi Stewart, > > you're right: more root classes have to be taken to generate the > xercesMinimal.jar. This will be fixed in next release. > > Cheers, > Marc. > > > -- Thierry. http://sites.google.com/site/tlegras/ |
From: Marc G. <mgu...@ya...> - 2010-01-11 13:18:52
|
Hi Stewart, you're right: more root classes have to be taken to generate the xercesMinimal.jar. This will be fixed in next release. Cheers, Marc. Stewart Cambridge a écrit : > I tried creating a new xercesMinimal.jar, thinking it was not based on > 2.9.1, but I got the same result. > > Inside the Neko code, in XercesBridge I think, there is some factory > code which detects which version of Xerces you are using. Some of this > detection is done but attempting to call methods on certain classes. > If those methods don't exist, it catches the exception and says "not > that version of xerces". > > Maybe the xercesMinimal code does not have the class it's looking for > by virtue of being minimal, and then it defaults back to an early > version of Xerces, which is the wrong one. > > I think the solution is to either override the XercesBridge class (I > may have the name slightly wrong) or to include the specific "version > detection" classes that XercesBridge is looking for. > > Stewart > > > 2010/1/8 Thierry Legras <tl...@gm...>: >> Yes, i tried in a pure java project (non android) and had the same issue. i >> think the minimal xerces jar is broken. does anybody could use it >> sucessfully? >> >> And yes i would avoid to use the full xerces lib as the apk size is more >> than 600kByte for a tiny application :( >> >> Thierry. >> http://sites.google.com/site/tlegras >> >> On Thu, Jan 7, 2010 at 2:18 AM, Stewart Cambridge >> <ste...@gm...> wrote: >>> I get this problem too when I switchbetween xercesMinimal and >>> xerces-2.9.1 - it's not a problem particular to android. >>> >>> But I guess mobile apps need to think about resources more carefully >>> than other apps. >>> >>> Stewart >>> >>> >>>> Hi, >>>> >>>> I am trying to use nekohtml to parse a html from an android device. For >>>> that >>>> i would like to use the minimal xerces jar and started from the code in >>>> sample Minimal.java class, but this does not seem to work. I tried again >>>> in >>>> a pure java project and had the same problem in parse function call. here >>>> is >>>> the backtrace: >>>> >>>> Exception in thread "main" java.lang.NoSuchMethodError: >>>> >>>>> org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V >>>> at >>>> >>>>> org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) >>>> at >>>> org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) >>>> at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) >>>> at >>>> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) >>>> at >>>> org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) >>>> at main.main(main.java:72) >>>> >>>> However in eclipse if i remove external lib xercesMinimal.jar to >>>> xercesImpl.jar from xercer 2.9.1, it seems to work like a charm. >>>> Is there anything i missed to make xercesMinimal.jar works? >>>> >>>> Another question: i just need to access a few content in some specific >>>> tag >>>> in an html page, so all the validity check as no interest for me. what >>>> feature/property should i turn off: >>>> probably balance-tags? more? >>>> >>>> -- >>>> Thierry. >>>> http://sites.google.com/site/tlegras >> >> >> -- >> Thierry. >> > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > |
From: Stewart C. <ste...@gm...> - 2010-01-08 08:48:46
|
I tried creating a new xercesMinimal.jar, thinking it was not based on 2.9.1, but I got the same result. Inside the Neko code, in XercesBridge I think, there is some factory code which detects which version of Xerces you are using. Some of this detection is done but attempting to call methods on certain classes. If those methods don't exist, it catches the exception and says "not that version of xerces". Maybe the xercesMinimal code does not have the class it's looking for by virtue of being minimal, and then it defaults back to an early version of Xerces, which is the wrong one. I think the solution is to either override the XercesBridge class (I may have the name slightly wrong) or to include the specific "version detection" classes that XercesBridge is looking for. Stewart 2010/1/8 Thierry Legras <tl...@gm...>: > Yes, i tried in a pure java project (non android) and had the same issue. i > think the minimal xerces jar is broken. does anybody could use it > sucessfully? > > And yes i would avoid to use the full xerces lib as the apk size is more > than 600kByte for a tiny application :( > > Thierry. > http://sites.google.com/site/tlegras > > On Thu, Jan 7, 2010 at 2:18 AM, Stewart Cambridge > <ste...@gm...> wrote: >> >> I get this problem too when I switchbetween xercesMinimal and >> xerces-2.9.1 - it's not a problem particular to android. >> >> But I guess mobile apps need to think about resources more carefully >> than other apps. >> >> Stewart >> >> >> >Hi, >> > >> >I am trying to use nekohtml to parse a html from an android device. For >> > that >> >i would like to use the minimal xerces jar and started from the code in >> >sample Minimal.java class, but this does not seem to work. I tried again >> > in >> >a pure java project and had the same problem in parse function call. here >> > is >> >the backtrace: >> > >> >Exception in thread "main" java.lang.NoSuchMethodError: >> > >> >> > >org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V >> >at >> > >> >> > >org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) >> >at >> >org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) >> >at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) >> >at >> >org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) >> >at >> >org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) >> >at main.main(main.java:72) >> > >> >However in eclipse if i remove external lib xercesMinimal.jar to >> >xercesImpl.jar from xercer 2.9.1, it seems to work like a charm. >> >Is there anything i missed to make xercesMinimal.jar works? >> > >> >Another question: i just need to access a few content in some specific >> > tag >> >in an html page, so all the validity check as no interest for me. what >> >feature/property should i turn off: >> >probably balance-tags? more? >> > >> >-- >> >Thierry. >> >http://sites.google.com/site/tlegras > > > > -- > Thierry. > |
From: Stewart C. <ste...@gm...> - 2010-01-07 01:22:16
|
Hi Neko Users, Does anyone know if there are any plans for a feature setting for whether empty (self-closing) tags should contain a '/' or not? In other words, I want my serialized HTML to contain <br/> rather than <br>. At the moment I'm working around by overriding: public void emptyElement(QName element, XMLAttributes attributes, Augmentations augs) { element.rawname = element.rawname + "/"; super.emptyElement(element, attributes, augs); } which seems a little hackish. It certainly doesn't solve <img src=""/> or <input type=""/> Anyone else need this? Regards, Stewart |
From: Stewart C. <ste...@gm...> - 2010-01-07 01:18:19
|
I get this problem too when I switchbetween xercesMinimal and xerces-2.9.1 - it's not a problem particular to android. But I guess mobile apps need to think about resources more carefully than other apps. Stewart >Hi, > >I am trying to use nekohtml to parse a html from an android device. For that >i would like to use the minimal xerces jar and started from the code in >sample Minimal.java class, but this does not seem to work. I tried again in >a pure java project and had the same problem in parse function call. here is >the backtrace: > >Exception in thread "main" java.lang.NoSuchMethodError: > >org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V >at > >org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) >at >org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) >at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) >at >org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) >at >org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) >at main.main(main.java:72) > >However in eclipse if i remove external lib xercesMinimal.jar to >xercesImpl.jar from xercer 2.9.1, it seems to work like a charm. >Is there anything i missed to make xercesMinimal.jar works? > >Another question: i just need to access a few content in some specific tag >in an html page, so all the validity check as no interest for me. what >feature/property should i turn off: >probably balance-tags? more? > >-- >Thierry. >http://sites.google.com/site/tlegras |
From: Thierry L. <tl...@gm...> - 2010-01-03 12:02:49
|
Hi, I am trying to use nekohtml to parse a html from an android device. For that i would like to use the minimal xerces jar and started from the code in sample Minimal.java class, but this does not seem to work. I tried again in a pure java project and had the same problem in parse function call. here is the backtrace: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.xerces.xni.XMLDocumentHandler.startDocument(Lorg/apache/xerces/xni/XMLLocator;Ljava/lang/String;Lorg/apache/xerces/xni/Augmentations;)V at org.cyberneko.html.xercesbridge.XercesBridge_2_0.XMLDocumentHandler_startDocument(XercesBridge_2_0.java:57) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2043) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) at main.main(main.java:72) However in eclipse if i remove external lib xercesMinimal.jar to xercesImpl.jar from xercer 2.9.1, it seems to work like a charm. Is there anything i missed to make xercesMinimal.jar works? Another question: i just need to access a few content in some specific tag in an html page, so all the validity check as no interest for me. what feature/property should i turn off: probably balance-tags? more? -- Thierry. http://sites.google.com/site/tlegras |
From: Max L. <ih...@gm...> - 2009-12-15 00:37:01
|
Hi, I am stripping tags from an HTML doc using the code below: ElementRemover remover = new ElementRemover(); // set which elements to accept //Accept html so we have data to look at. remover.acceptElement("html", null); remover.acceptElement("table", null); remover.acceptElement("td", null); remover.acceptElement("tr", null); remover.acceptElement("div", null); // completely remove script elements remover.removeElement("script"); // Remove any head remover.removeElement("head"); //remover.removeElement("a"); // create writer filter org.cyberneko.html.filters.Writer writer = new org.cyberneko.html.filters.Writer(); // setup filter chain XMLDocumentFilter[] filters = { remover, writer, }; // create HTML parser //XMLParserConfiguration parser = new HTMLConfiguration(); DOMParser parser = new DOMParser(); parser.setProperty("http://cyberneko.org/html/properties/filters", filters); InputSource source = new InputSource(new StringReader(html)); parser.parse(source); return parser.getDocument(); However, when I call parser.parse, I get the parsed document written to stdout without me explicitly requesting it to be. It's not a big deal, but I'm curious if there is a way to turn this off, or modify the code so it doesn't happen? I tried digging into neko but I couldn't find where it might be written out. I do my logging through Log4j so it does conflict a bit and fills my cron output. Any ideas? Thanks. |
From: Len T. <lta...@jo...> - 2009-11-03 19:29:52
|
Sorry about the newbie question. Html comments were already being removed. On the page I was parsing there happened to be html comments inside other elements which were making it through. Len -----Original Message----- From: Len Takeuchi [mailto:lta...@jo...] Sent: October-28-09 4:17 PM To: 'nek...@li...' Subject: removing html comments Hello, I'm using filters as in the provided sample to remove <script> elements as below. I would also like to remove html comments, i.e. < ! - - blah blah - - > How does one do that? Regards, Len ElementRemover remover = new ElementRemover(); // set which elements to accept remover.acceptElement("b", null); remover.acceptElement("i", null); remover.acceptElement("u", null); remover.acceptElement("a", new String[] { "href" }); // completely remove script elements remover.removeElement("script"); |
From: Len T. <lta...@jo...> - 2009-10-28 23:56:07
|
Hello, I'm using filters as in the provided sample to remove <script> elements as below. I would also like to remove html comments, i.e. < ! - - blah blah - - > How does one do that? Regards, Len ElementRemover remover = new ElementRemover(); // set which elements to accept remover.acceptElement("b", null); remover.acceptElement("i", null); remover.acceptElement("u", null); remover.acceptElement("a", new String[] { "href" }); // completely remove script elements remover.removeElement("script"); |
From: Stephen M. <ste...@gm...> - 2009-09-23 21:22:30
|
The first example on http://nekohtml.sourceforge.net/usage.html has "sax.Counter" Where can I find the source code to this ? -Thanks Stephen More |
From: Marc G. <mgu...@ya...> - 2009-09-02 08:49:27
|
Hi all, release 1.9.13 of NekoHTML is now available. http://nekohtml.sourceforge.net This release contains different improvements and bug fixes. Description of the changes is available at http://nekohtml.sourceforge.net/changes.html The maven bundle has been uploaded to NekoHTML repository and should become available in the main repository within a few hours. Enjoy! Marc. -- Web: http://www.efficient-webtesting.com Blog: http://mguillem.wordpress.com |
From: Filippo De L. <dl....@fi...> - 2009-08-28 11:08:29
|
Hi Guys, Anybody know how can I resolve the declared prefix for a knew namespaceURI? So my namespace URI is: http://www.filosganga.it/xyz, but I don't know which prefix is associate with this namespace. Once obtained document from markup, there is a way to know the prefix of my namespace? Thanks -- Filippo De Luca -------------------------- Email: dl....@fi... Web: http://www.filosganga.it LinkedIn: http://www.linkedin.com/in/filippodeluca mobile: +393395822588 |
From: Filippo De L. <dl....@fi...> - 2009-08-28 11:05:19
|
Hi Fabrice, I am using this code: InputSource inputSource = new InputSource(new StringReader(markup)); Transformer transformer = TransformerFactory.newInstance().newTransformer(); SAXParser reader = new SAXParser(); reader.setFeature("http://xml.org/sax/features/namespaces", true); reader.setFeature("http://xml.org/sax/features/namespace-prefixes", true); reader.setProperty(" http://cyberneko.org/html/properties/default-encoding", "UTF-8"); DOMResult result = new DOMResult(); transformer.transform(new SAXSource(reader, inputSource), result); Document document = (Document)result.getNode(); I hope this work for you. 2009/8/13 Fabrice Estiévenart <fab...@ce...> > Hello, > > How can I use NekoHtml as a SAX Parser ? Could you please give me some > examples ? > > Thank you, > > Fabrice > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > nekohtml-user mailing list > nek...@li... > https://lists.sourceforge.net/lists/listinfo/nekohtml-user > -- Filippo De Luca -------------------------- Email: dl....@fi... Web: http://www.filosganga.it LinkedIn: http://www.linkedin.com/in/filippodeluca mobile: +393395822588 |
From: Fabrice E. <fab...@ce...> - 2009-08-13 13:22:08
|
Hello, How can I use NekoHtml as a SAX Parser ? Could you please give me some examples ? Thank you, Fabrice |
From: Marc G. <mgu...@ya...> - 2009-08-11 13:58:43
|
Hi, Neko starts with the encoding that you provide. If it encounters a meta tag indicating an other (compatible) encoding, it uses it. I think that NekoHTML should use following: http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#determining-the-character-encoding we already use it in HtmlUnit but it hasn't been backported to NekoHTML. Cheers, Marc. Tarjei Huse wrote: > Hi, > > I'm trying to understand how Neko deals with encodings and how I should > work around them. > > When a page is fetched, the request headers often provide some encoding > information that may or may not be overridden by the encoding set in the > metadirective of the page. > > Now, what I am wondering about is how I should expect Neko to handle > this. I can see that the Writer filter changes the files encoding based > on the encoding of the outputstream and the content of the <meta > http-equiv..> element. > > Is there a way to know if this has happend? > > I'm also planning to put in place a charset detection algorithm on top > of Neko to make sure that badly encoded pages are handled correctly. > Does anyone have a tip on when this should be done? Before or after > running the Neko purifier and writer filters? > > Kind regards, > Tarjei > |
From: Tarjei H. <ta...@sc...> - 2009-08-06 08:29:11
|
Hi, I'm trying to understand how Neko deals with encodings and how I should work around them. When a page is fetched, the request headers often provide some encoding information that may or may not be overridden by the encoding set in the metadirective of the page. Now, what I am wondering about is how I should expect Neko to handle this. I can see that the Writer filter changes the files encoding based on the encoding of the outputstream and the content of the <meta http-equiv..> element. Is there a way to know if this has happend? I'm also planning to put in place a charset detection algorithm on top of Neko to make sure that badly encoded pages are handled correctly. Does anyone have a tip on when this should be done? Before or after running the Neko purifier and writer filters? Kind regards, Tarjei -- Tarjei Huse Mobil: 920 63 413 |