I wonder how to modify the parameter file in a way that makes the Tokenizer Operation in the Morphological Processor concatenate multi word terms and look for these but doesn't let it return single terms as results when 'lookupAllBaseForms' (resp. 'lookupIndexWord') is called.
to illustrate this:
if I have a multi word term, lets say 'european union', the Morphological Processor returns 'european' and 'union' beside 'european union' as base forms, while i would like to have only 'european union' returned. this is a problem if there is no index word for the multi word term itself, like with lets say som mysterious 'fun organisation' in which case I would get 'fun' and 'organisation', while i would rather have an empty set returned. anyone knows how to do this ? of course, I could patch this myself, checking wether the number of terms in query and result are the same, but this would also destroy such nice things as getting the desirable result of 'cellphone' for the query 'cell phone'.
thanks in advance
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.