Menu

#13 http indexer filter html tags

open
nobody
None
5
2003-04-23
2003-04-23
Anonymous
No

im http indexer im file
class.lib_indexer_universal_phpcms.php in zeile 517
und 521 die leerzeichen aus den replacetags nehmen.

wenn in einem wort nur ein buchstabe zum beispiel fett
hervorgehoben wird, wird das word gespalten und eine
leerstelle entsteht im ausgegebenen suchtext.

Discussion

  • Thilo Wagner

    Thilo Wagner - 2003-04-23
    • labels: --> 474405
    • priority: 5 --> 7
     
  • Thilo Wagner

    Thilo Wagner - 2003-04-23
    • labels: 474405 -->
    • priority: 7 --> 8
     
  • Thilo Wagner

    Thilo Wagner - 2003-04-23

    Logged In: YES
    user_id=504414

    I don't think this can be solved easily. The problem is, that if
    there is no space placed there, you would again get wrong
    results, for example:

    Hello<br>World

    would be saved as "HelloWorld" if you don't replace HTML-
    Tags with spaces.

    So maybe the correct behaviour would be to handle different
    HTML-Tags with different replacements.

    The HTTP_Indexer would have to make a difference between
    HTML-Tags which produce some space and thouse, which
    don't.

    Just another example:

    In

    foo<img src="blahblubb.jpg">bar
    foo<hr>bar
    foo<table><tr><td>bar</td><td>blah</td></tr></table>blubb

    the tags must be replaced by a space to get the correct
    result, in

    foo<b>bar</b>
    foo<a href="blah.html">bar</a>

    the tags shoud not be replaced by a space to get the correct
    result.

    Maybe a small list of tags, which should not replaced by a
    space could solve the problem, but it won't be 100% perfect
    in all cases.

    But the normal behaviour in replacing html-Tags is to replace
    them with a space. Probably also commercial search engines
    (at least most of them) will replace html-Tags with spaces so
    if you have the String

    <b>php</b>CMS

    on your page and it will be indexed by one of the common
    search engines, I suppose the page wouldn't be found when
    searching for "phpCMS".

    As this is not a bug I'm moving it to the feature request
    tracker.

    ciao.. Iggi

     
  • Thilo Wagner

    Thilo Wagner - 2003-04-23
    • priority: 8 --> 5
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.