Hi,
I am using nekohtml on my android application with success since several
years to parse a web page (not my web server).
However since a couple of days, a large number of users are having troubles
to parse the web page and I discovered that at some point the characters
methods of my DefaultFilter implementation returns half of the page at a
point while I was expecting a endElement.
The page is far from being W3C compliant (two <body> (!), several errors, )
but so far, it was parsed without troubles by nekohtml
example of the page is here:
https://adsl.free.fr/magneto.pl?id=382730&idt=1c9493302659d686&liste=1
parsing is fine up to line 279
> <!--Javascripts-->
> <script type="text/javascript" src="/js/jquery.min.js">
>
correctly makes endElement() called.
but after while I was expecting endElement() called with "script" tag, I
saw that characters() *on some users only* is called once with text
contains
>
> </script>
>
>
>
> <body>
> <!--
> <div id="body">
> <div id="content-container" class="content">
> <h1>ENREGISTREMENTS TV</h1>
> <ul class="tab">
> <li class="selected"><a href="?id=1682005&idt=c77a844e32281c74&liste=1"
> >Liste des enregistrements</a> </li>
> ...
up to end of the html page
does anybody ever saw similar behavior?
any idea how I can track this issue? I can not debug it (the error does not
occur on my device) but at least I can add logs.
Thanks for any help!
--
Thierry.
|