im http indexer im file
class.lib_indexer_universal_phpcms.php in zeile 517
und 521 die leerzeichen aus den replacetags nehmen.
wenn in einem wort nur ein buchstabe zum beispiel fett
hervorgehoben wird, wird das word gespalten und eine
leerstelle entsteht im ausgegebenen suchtext.
Logged In: YES
user_id=504414
I don't think this can be solved easily. The problem is, that if
there is no space placed there, you would again get wrong
results, for example:
Hello<br>World
would be saved as "HelloWorld" if you don't replace HTML-
Tags with spaces.
So maybe the correct behaviour would be to handle different
HTML-Tags with different replacements.
The HTTP_Indexer would have to make a difference between
HTML-Tags which produce some space and thouse, which
don't.
Just another example:
In
foo<img src="blahblubb.jpg">bar
foo<hr>bar
foo<table><tr><td>bar</td><td>blah</td></tr></table>blubb
the tags must be replaced by a space to get the correct
result, in
foo<b>bar</b>
foo<a href="blah.html">bar</a>
the tags shoud not be replaced by a space to get the correct
result.
Maybe a small list of tags, which should not replaced by a
space could solve the problem, but it won't be 100% perfect
in all cases.
But the normal behaviour in replacing html-Tags is to replace
them with a space. Probably also commercial search engines
(at least most of them) will replace html-Tags with spaces so
if you have the String
<b>php</b>CMS
on your page and it will be indexed by one of the common
search engines, I suppose the page wouldn't be found when
searching for "phpCMS".
As this is not a bug I'm moving it to the feature request
tracker.
ciao.. Iggi