Menu

#126 Infinite loop on HTML parsing

v2.13
closed-fixed
nobody
None
5
2015-05-18
2014-09-16
rasifiel
No

HtmlCleaner parsing document http://avtogsm.ru/beltronics-sti-r-plus-p4559.html loops forever.
Problematic fragment:
<table><rt><td>
<rt> must be in <ruby> parent, so it is puted in queue: <table><ruby<rt><td>
Then <ruby> processed and it can't be in <table>, so it's puted in <table> itemsToMove. Queue is: <table><rt><td> again and process repeated.
I haven't found simple solution, but I think, that created parent tag must be moved with it's children (but now we have no such information on token phase)

Discussion

  • Scott Wilson

    Scott Wilson - 2014-09-16
    • status: open --> open-accepted
     
  • Scott Wilson

    Scott Wilson - 2014-09-16

    Hmm, that does look a tricky one to solve.

     
  • Scott Wilson

    Scott Wilson - 2014-09-26

    It also seems related to this problem:

    https://sourceforge.net/p/htmlcleaner/bugs/129/

     
  • Scott Wilson

    Scott Wilson - 2015-05-12
    • Group: v 2.10 --> v2.12
     
  • Scott Wilson

    Scott Wilson - 2015-05-15
    • Group: v2.12 --> v2.13
     
  • Scott Wilson

    Scott Wilson - 2015-05-16

    One workaround is if we switch the order of processing rules. So process requiredParent after checking to see if tags need moving. That certainly handles this problem, though it does have other side effects.

     
  • Scott Wilson

    Scott Wilson - 2015-05-18
    • status: open-accepted --> closed-fixed
     
  • Scott Wilson

    Scott Wilson - 2015-05-18

    OK I've gone with the solution of processing required tags after the applying the rule of misplaced tags. This resolves the issue (and a number of loops) however it does mean that we no longer handle the simple case of adding missing TRs around TDs in tables; that may require a special exception as its a common case.

     

Log in to post a comment.