From: Alex R. <al...@se...> - 2002-07-09 17:13:33
|
vertigo wrote: > Yes, but it is also one of the more complicated regions (a dark, shadowy > corner) of the project. Regular expressions are not, as mentioned, the > best way to parse HTML on a large scale. It can get way out of control. > An actual parser is better for a number of reasons. I'm not sure that I follow this rationale, given your understanding of the significant costs and marginal benefits this provides (as demonstrated below) > Put into the project perspective, we have to write HTML parsers for each > implementation, and this can be much more complicated than it first > appears. We might not want to have limited support in the first release, > and then improve it later. What for? Why would we _ever_ need to do such a thing? You trust some intput, you don't trust some other input. If something is tainted, then strip out all semblence of <script> tags. We don't have to handle badly nested tag sets, etc... we just have to canonicalize the data then clobber the beginning tag, end of story. > Cross-site scripting is a huge issue, and > deserves to be handled in great detail. agreed, I'm just not quite so sure it's as hard a problem as you're making it out to be. -- Alex Russell al...@Se... al...@ne... |