|
From: Yuri T. <qar...@gm...> - 2008-07-08 16:40:23
|
> Is this still true if you have inline not-necessarily-legal-XML blocks? > (i.e. will it still be easy to convert: > **Foo** > <br> > blah blah blah > 'bar' > ?) I meant a simple RE-based substitution. Correct me if I am wrong, but converting XHTML into HTML largely involves changing <$x/> and <$x></$x> to to <$x> for certain values of $x. > What if we went with the BOM character (oxFEFF) as the replacement? It's > legal unicode, and _extremely_ unlikely to occur in the middle of text. The > only thing to watch out for is having it occur at the start of the file. First, my original intention was to use not \u0001 and \u0002 but rather \u0002 and \u0003 - "start of text" (STX) and "end of text" (ETX). The nice thing about them is that they come as a pair - start and end. Also, if we use BOM we'll have to worry about HTML, etc. occuring in the beginning of the text. But this is an option to keep in mind. Alternatively, we can look into the private ranges, though then we have to make sure that our use does not conflict with possible private uses by the caller. - yuri -- http://sputnik.freewisdom.org/ |