[Phpslash-devel] Re: [Back-end-development] Annoying Problem with stripBadHTML
Brought to you by:
joestewart,
nhruby
From: Mike G. <mi...@op...> - 2003-06-20 14:30:23
|
Hello Sam, I've cc'd the slash development list. On Thu, 2003-06-19 at 23:14, Sam Williams wrote: > On Fri, 2003-06-20 at 02:38, Mike Gifford wrote: > > In cutting/pasting it in sections it seemed to get hung up on: > > <a > > name=Baird></a><br> > > After removing all of these hanging <a's I was able to paste it in > > without difficult. I just had to remove a number of them. > I seem to remember having problems with this function in phpSlash a > while back. > At a guess, I would say that the newline, LF or LF/CR character pair > (which it is depends on whether the text came from Windows, Unix or Mac) > after the <a are messing up the regex matching. Yup. > Looking at (an old version of) the function, I don't think the lines > designed to 'standardise' html tags will match a hanging tag: > > $str = eregi_replace("<[[:space:]]*([^>]*)[[:space:]]*>","<\\1>",$str); > $str = eregi_replace("<a([^>]*)href=\"?([^\"]*)\"?([^>]*)>", > "<a href=\\2>", $str); This is from the phpSlash CVS version: $str = eregi_replace("<[[:space:]]*([^>]*)[[:space:]]*>","<\\1>",$str); $str = eregi_replace("<a([^>]*)href=\"?([^\"]*)\"?([^>]*)>", "<a href=\\2>", $str); > and I don't think this will either: > while (eregi("<([^> ]*)([^>]*)>",$str,$reg)) { I'm not a regex expert, but I can't see why this would be interrupted by a line break... > A quick fix would be another eregi_replace that strips all newline, LF > and CR characters from within tags before anything else is done with the > string. Yes, but that would make it very difficult to edit afterwards, right? > Perhaps it's time to update the stripBadHTML so that it uses the native > PHP function that was introduced in 3.0.8: I'd be in favour of moving this way. I would think it would speed up the code to have the processing done by native php functions. > string strip_tags ( string str [, string allowable_tags]) What it doesn't allow is the degree of control that stripBadHTML presently offers. The ability to allow tags or tags and definitions is nice. Mind you I don't know how many folks use that. > I'm not sure how well this works compared to the phpslash code, but it > is much more readable :-). This is probably something for the > phpSlash-devel list... It doesn't seem to be a problem in phpSlash. the line break doesn't seem to interfere with the code. Mike -- Mike Gifford, OpenConcept Consulting Free Software for Social Change -> http://www.openconcept.ca Featured Client: CUPE National -> http://www.cupe.ca Whoever controls the media-the images-controls the culture - A. Ginsberg |