From: David H. <hic...@op...> - 2010-11-20 02:21:46
|
I am attempting to switch the master branch of MantisBT to use XHTML strict output (instead of HTML transitional) and came across an issue with the feature which renders certain HTML elements contained inside bug titles, descriptions, etc. At the moment we allow the administrator to set $g_html_valid_tags and $g_html_valid_tags_single_line to contain a list of "safe" HTML elements such as h1, h2, h3, h4, h5, h6, p, br, hr, pre, em, strong, small, code, b, i, u, ul, li, ol, dl, dt, dd. When these elements are detected in a bug title, description, note, etc we render the HTML elements as-is in the browser. Hence a description containing <strong>Something important</strong> will show up in browsers as "Something important" in a bold typeface. If a user were to enter "<hr>" this will be rendered as a horizontal line in HTML. When we use XHTML "<hr>" will stop the XML processing of the XHTML document (ie. browsers will just show an error) as "<hr />" is expected (well formed XML). Inside string_restore_valid_html_tags() (core/string_api.php) the following 3 lines of code are responsible for re-inserting safe HTML elements into the raw HTML output: $p_string = preg_replace( '/<(' . $tags . ')>/ui', '<\\1>', $p_string ); $p_string = preg_replace( '/<\/(' . $tags . ')>/ui', '</\\1>', $p_string ); $p_string = preg_replace( '/<(' . $tags . ')\s?\/>/ui', '<\ \1 />', $p_string ); Note that these lines pay no attention at all to whether the resulting HTML output is valid or well formed. A user could enter <h1><li><ul><li>Test</li></ul></li></h1> which would be considered by string_restore_valid_html_tags() to be OK to render as-is in the browser. However in reality, this HTML output is completely wrong and will break XHTML rendering. I don't see any way of fixing this issue (not even by using PHP's DOMDocument class). Further to the point, I think allowing HTML tags is inherently wrong in the first place (rendering issues aside). We should be thinking of MantisBT as being an output-layer-agnostic database (with a set of core API functions) whereby HTML may never be used. Perhaps a user is only accessing MantisBT via SOAP, email, etc. The use of HTML elements in bug titles, descriptions, etc in these mediums makes no sense at all. Therefore I propose that we drop support for rendering of HTML tags in bug titles, descriptions, etc. In place I propose that we work towards implementing a Markdown[1] (or similar) processor in the XHTML output layer. Thus if a description is entered as: =Section= This is a list: * Item 1 * Item 2 This description is entered directly into the database without any reformatting. This description is also provided via SOAP and email interfaces as-is without reformatting (because it's not possible to convey formatting across these mediums). For XHTML output we retrieve the description from the database, escape special XHTML characters, run it through a Markdown processor and output the result directly to the browser. Why is this approach better than what we currently do? 1. Formatting is contained entirely within the output layer making the code much simpler and less prone to errors. Mediums such as email and SOAP are not polluted with meaningless HTML tags either. 2. We can ensure that MantisBT is generating valid strict XHTML output. This is a great debugging capability that ensures MantisBT core and plugins don't have markup errors (non-closed tags, unescaped special characters, deprecated elements and attributes, etc). 3. Markdown (or a similar approach) is readable in plaintext across all mediums. HTML tags aren't easy to read in SOAP/email mediums and are hard to write as well (more typing, knowledge of HTML required, etc). I'm interested in thoughts on this proposal. Regards, David [1] http://en.wikipedia.org/wiki/Markdown |