Webhelp Search crops the search term

Single Source Publishing Web-Application

Brought to you by: manfredp

#66 Webhelp Search crops the search term

Milestone: 1.0

Status: open

Owner: nobody

Labels: None

Updated: 2018-08-03

Created: 2018-04-09

Creator: Danny Buddenberg

Private: No

We've got problems with the search. When our users search for a term it often gets cropped to just 8 or 9 letters and (because oft that?) it does not find the term in the webhelp although it is definetly there! What informations do you need to investigate this issue?

1 Attachments

rangreih.jpg

Discussion

Manfred P. - 2018-05-02

In case the export language is German, I found following issue: "Rangreihe" is not found, but "Rangreih" is found. For export language "English" this problem does not appear. I will further have to investigate on that problem. Though I cannot completely reproduce the situation in your screenshot, where search term is cropped in the result panel.

Last edit: Manfred P. 2018-05-02

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Danny Buddenberg - 2018-05-11
  
  Yes, the export language ist German. In our case the search found nothing - neither "Rangreihe" nor "Rangreih". Many search terms get cropped, but not every term. "Zeitklassen" got "zeitklass", "Tabellierung" got "tabellier" and "Vorfilter" got "vorfilt". So it's not always cut above 8 letters. Many other terms don't have a problem at all: "Medienanalyse", "Grundgesamtheit" and "Planeingabe" are doing well.
  
  You can access the documentation at http://docs.comsulting.net/topmodular/prod/content/ch01.html
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Manfred P. - 2018-05-19

This is a bug that affects German and French output language. The WebHelp fulltext search includes so called "stemmer" for the English, French and German language. The stemmer shall reduce words, which have the same meaning, but only differ in the suffix, to the same "base-word". Unfortunately, in the generated HTML output always the English stemmer file is referenced, even if the output language is German or French.

This will be fixed in the next Docmenta version. Until then, as a workaround, you can fix the generated WebHelp V2 output by renaming the file de_stemmer.js (which is located in the output path search/stemmers) to en_stemmer.js.

Last edit: Manfred P. 2018-05-19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Danny Buddenberg - 2018-05-23
  
  Yes. That workaround solves the "cropping". "Rangreihe" remains "rangreihe". But unfortunately docmenta still can not find the searchterm. Although the word certainly occurs in the WebHelp.
  
  Last edit: Danny Buddenberg 2018-05-23
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Manfred P. - 2018-05-31

I found another issue. It seems that somewhere in your content there is a Unicode line-break U+2028 (probably in content node with alias "Zeitklassen") . This unicode line-break is not recognized by the indexer as whitespace. This causes a Javascript error in the generated index file (index-3.js).

This can be fixed as follows:
In your Docmenta installation, open the file

apache-tomcat/webapps/docmenta/docbook-xsl/webhelp/template/content/search/punctuation.props

in a text-editor and add the following line at the end:

Punct29=\\u2028

Then export the publication again. By adding this line, the Unicode character U+2028 is treated as punctuation and will be ignored by the indexer.

If you do not want to reexport the publication, you can also fix the already exported publication. To do this, open the file

search/index-3.js

of the exported publication and remove the lines that contain the Unicode linebreak U+2028. This might be hard to recognize. In your example just delete all lines after the line w["zwischendurch"]="0";

Last edit: Manfred P. 2018-05-31

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Danny Buddenberg - 2018-06-04
  
  Thank you very much. Fixing the already exported publication worked fine. Next we will try to edit the punctuation.props.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Manfred P. - 2018-08-03

Both issues should be fixed in version 1.9.3 (also the unicode linebreaks have been added to punctuation.props).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.