Menu

#66 Webhelp Search crops the search term

1.0
open
nobody
None
2018-08-03
2018-04-09
No

We've got problems with the search. When our users search for a term it often gets cropped to just 8 or 9 letters and (because oft that?) it does not find the term in the webhelp although it is definetly there! What informations do you need to investigate this issue?

1 Attachments

Discussion

  • Manfred P.

    Manfred P. - 2018-05-02

    In case the export language is German, I found following issue: "Rangreihe" is not found, but "Rangreih" is found. For export language "English" this problem does not appear. I will further have to investigate on that problem. Though I cannot completely reproduce the situation in your screenshot, where search term is cropped in the result panel.

     

    Last edit: Manfred P. 2018-05-02
    • Danny Buddenberg

      Yes, the export language ist German. In our case the search found nothing - neither "Rangreihe" nor "Rangreih". Many search terms get cropped, but not every term. "Zeitklassen" got "zeitklass", "Tabellierung" got "tabellier" and "Vorfilter" got "vorfilt". So it's not always cut above 8 letters. Many other terms don't have a problem at all: "Medienanalyse", "Grundgesamtheit" and "Planeingabe" are doing well.

      You can access the documentation at http://docs.comsulting.net/topmodular/prod/content/ch01.html

       
  • Manfred P.

    Manfred P. - 2018-05-19

    This is a bug that affects German and French output language. The WebHelp fulltext search includes so called "stemmer" for the English, French and German language. The stemmer shall reduce words, which have the same meaning, but only differ in the suffix, to the same "base-word". Unfortunately, in the generated HTML output always the English stemmer file is referenced, even if the output language is German or French.

    This will be fixed in the next Docmenta version. Until then, as a workaround, you can fix the generated WebHelp V2 output by renaming the file de_stemmer.js (which is located in the output path search/stemmers) to en_stemmer.js.

     

    Last edit: Manfred P. 2018-05-19
    • Danny Buddenberg

      Yes. That workaround solves the "cropping". "Rangreihe" remains "rangreihe". But unfortunately docmenta still can not find the searchterm. Although the word certainly occurs in the WebHelp.

       

      Last edit: Danny Buddenberg 2018-05-23
  • Manfred P.

    Manfred P. - 2018-05-31

    I found another issue. It seems that somewhere in your content there is a Unicode line-break U+2028 (probably in content node with alias "Zeitklassen") . This unicode line-break is not recognized by the indexer as whitespace. This causes a Javascript error in the generated index file (index-3.js).

    This can be fixed as follows:
    In your Docmenta installation, open the file

    apache-tomcat/webapps/docmenta/docbook-xsl/webhelp/template/content/search/punctuation.props

    in a text-editor and add the following line at the end:

    Punct29=\\u2028

    Then export the publication again. By adding this line, the Unicode character U+2028 is treated as punctuation and will be ignored by the indexer.

    If you do not want to reexport the publication, you can also fix the already exported publication. To do this, open the file

    search/index-3.js

    of the exported publication and remove the lines that contain the Unicode linebreak U+2028. This might be hard to recognize. In your example just delete all lines after the line w["zwischendurch"]="0";

     

    Last edit: Manfred P. 2018-05-31
    • Danny Buddenberg

      Thank you very much. Fixing the already exported publication worked fine. Next we will try to edit the punctuation.props.

       
  • Manfred P.

    Manfred P. - 2018-08-03

    Both issues should be fixed in version 1.9.3 (also the unicode linebreaks have been added to punctuation.props).

     

Log in to post a comment.

MongoDB Logo MongoDB