Menu

#29 Parser should decode HTML entities

closed
None
2019-04-19
2009-07-16
No

Example:

$dom = new simple_html_dom('

&

');
echo $dom->find('*', 0)->plaintext; // Got "&", but expected "&"

Discussion

  • Francesc Rosàs

    Francesc Rosàs - 2009-07-16
     
  • Francesc Rosàs

    Francesc Rosàs - 2009-07-16

    I've solved it by adding a htmlspecialchars_decode() to any output function, but I suppose it should be fixed in the parser itself.

     
  • Alex

    Alex - 2010-05-05

    I don't think its a good idea to make this permanent because lets say you have ">"(greater than symbol) and/or "<"(less than symbol) anywhere in your text and you decode that then your HTML would become invalid basically

     
  • Francesc Rosàs

    Francesc Rosàs - 2010-05-06

    Please check the patch I submited, it only modifies text() and __get() functions. These functions doesn't return HTML so keeping HTML entities in it doesn't have any sense.

    BTW, probably there is a bug in text() implementation as a same text can be decoded multiple times.

     
  • LogMANOriginal

    LogMANOriginal - 2019-04-19
    • status: open --> closed
    • assigned_to: LogMANOriginal
    • discussion: enabled --> disabled
     
  • LogMANOriginal

    LogMANOriginal - 2019-04-19

    Thanks for including a patch.
    Closing this in favor of https://sourceforge.net/p/simplehtmldom/feature-requests/52/ - please continue discussion on that ticket.

     
MongoDB Logo MongoDB