Webhelp Search crops the search term
Single Source Publishing Web-Application
Brought to you by:
manfredp
We've got problems with the search. When our users search for a term it often gets cropped to just 8 or 9 letters and (because oft that?) it does not find the term in the webhelp although it is definetly there! What informations do you need to investigate this issue?
In case the export language is German, I found following issue: "Rangreihe" is not found, but "Rangreih" is found. For export language "English" this problem does not appear. I will further have to investigate on that problem. Though I cannot completely reproduce the situation in your screenshot, where search term is cropped in the result panel.
Last edit: Manfred P. 2018-05-02
Yes, the export language ist German. In our case the search found nothing - neither "Rangreihe" nor "Rangreih". Many search terms get cropped, but not every term. "Zeitklassen" got "zeitklass", "Tabellierung" got "tabellier" and "Vorfilter" got "vorfilt". So it's not always cut above 8 letters. Many other terms don't have a problem at all: "Medienanalyse", "Grundgesamtheit" and "Planeingabe" are doing well.
You can access the documentation at http://docs.comsulting.net/topmodular/prod/content/ch01.html
This is a bug that affects German and French output language. The WebHelp fulltext search includes so called "stemmer" for the English, French and German language. The stemmer shall reduce words, which have the same meaning, but only differ in the suffix, to the same "base-word". Unfortunately, in the generated HTML output always the English stemmer file is referenced, even if the output language is German or French.
This will be fixed in the next Docmenta version. Until then, as a workaround, you can fix the generated WebHelp V2 output by renaming the file de_stemmer.js (which is located in the output path search/stemmers) to en_stemmer.js.
Last edit: Manfred P. 2018-05-19
Yes. That workaround solves the "cropping". "Rangreihe" remains "rangreihe". But unfortunately docmenta still can not find the searchterm. Although the word certainly occurs in the WebHelp.
Last edit: Danny Buddenberg 2018-05-23
I found another issue. It seems that somewhere in your content there is a Unicode line-break U+2028 (probably in content node with alias "Zeitklassen") . This unicode line-break is not recognized by the indexer as whitespace. This causes a Javascript error in the generated index file (index-3.js).
This can be fixed as follows:
In your Docmenta installation, open the file
apache-tomcat/webapps/docmenta/docbook-xsl/webhelp/template/content/search/punctuation.props
in a text-editor and add the following line at the end:
Punct29=\\u2028
Then export the publication again. By adding this line, the Unicode character U+2028 is treated as punctuation and will be ignored by the indexer.
If you do not want to reexport the publication, you can also fix the already exported publication. To do this, open the file
search/index-3.js
of the exported publication and remove the lines that contain the Unicode linebreak U+2028. This might be hard to recognize. In your example just delete all lines after the line w["zwischendurch"]="0";
Last edit: Manfred P. 2018-05-31
Thank you very much. Fixing the already exported publication worked fine. Next we will try to edit the punctuation.props.
Both issues should be fixed in version 1.9.3 (also the unicode linebreaks have been added to punctuation.props).