From: Alan R. <ala...@mn...> - 2012-05-14 18:32:50
|
Hello Demian, I have the same settings that you posted. This is something I have not fooled with. I'm trying to stay as close to the trunk as I can. Although this problem is from a library where we are hosting for them and they asked to have the title display enhanced. I changed the setting to: 200 for the GapFragmenter 140 for the RegexFragmenter That gets me: "woody allen zelig" search with 6 results: [_highlighting] => Array [title_full] => Array ( [0] => {{{{START_HILITE}}}}Zelig{{{{END_HILITE}}}} [videorecording] / Metro Goldwyn Mayer ; Orion Pictures and Warner Bros. present ; written and directed by {{{{START_HILITE}}}}Woody{{{{END_HILITE}}}} {{{{START_HILITE}}}}Allen{{{{END_HILITE}}}}. ) "woody allen zelig metro mayer" search with 1 result: [_highlighting] => Array [title_full] => Array ( [0] => {{{{START_HILITE}}}}Zelig{{{{END_HILITE}}}} [videorecording] / {{{{START_HILITE}}}}Metro{{{{END_HILITE}}}} Goldwyn {{{{START_HILITE}}}}Mayer{{{{END_HILITE}}}} ; Orion Pictures and Warner Bros. present ; written and directed by {{{{START_HILITE}}}}Woody{{{{END_HILITE}}}} {{{{START_HILITE}}}}Allen{{{{END_HILITE}}}}. ) So I guess I need to up these parameters, as the [_highlighting] => [title_full] is being used for the title display in this implementation. thanks for the help -- al On Mon, 2012-05-14 at 17:44 +0000, Demian Katz wrote: > What exactly are all of your highlighting settings? > > Here are the defaults in the current trunk: > > <searchComponent class="solr.HighlightComponent" name="highlight"> > <highlighting> > <!-- Configure the standard fragmenter --> > <!-- This could most likely be commented out in the "default" case --> > <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true"> > <lst name="defaults"> > <int name="hl.fragsize">100</int> > </lst> > </fragmenter> > > <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) --> > <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter"> > <lst name="defaults"> > <!-- slightly smaller fragsizes work better because of slop --> > <int name="hl.fragsize">70</int> > <!-- allow 50% slop on fragment sizes --> > <float name="hl.regex.slop">0.5</float> > <!-- a basic sentence pattern --> > <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str> > </lst> > </fragmenter> > > <!-- Configure the standard formatter --> > <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true"> > <lst name="defaults"> > <str name="hl.simple.pre"><![CDATA[<em>]]></str> > <str name="hl.simple.post"><![CDATA[</em>]]></str> > </lst> > </formatter> > </highlighting> > </searchComponent> > > Note that the default hl.fragsize is 70 rather than the 100 you mentioned -- not sure if this means you have changed this or were just referring to the Solr default if no other value is configured. > > In any case, right now, we have a 50% slop on fragment size, which means that highlighted fragments may fall between 35 and 105 characters in length. The example you show still seems a bit on the short side, but I'd be interested to see what happened if you raised your hl.fragsize and lowered your hl.regex.slop. > > - Demian > > > -----Original Message----- > > From: Alan Rykhus [mailto:ala...@mn...] > > Sent: Monday, May 14, 2012 1:10 PM > > To: vuf...@li... > > Subject: [VuFind-Tech] highlighting question > > > > Hello, > > > > I was working on an issue when I came across some unusual results. I'm > > seeing an issue with the fields returned from Solr and was wondering if > > someone can explain them. It has to do with the _highlighting fields in > > the record returned from Solr. > > > > You can see the interface at: https://bridgedev.mnpals.net/vufind/ > > > > The original record has the following title field: > > > > 245 0 0 |a Zelig |h [videorecording] / |c Metro Goldwyn Mayer ; Orion > > Pictures and Warner Bros. present ; written and directed by Woody > > Allen. > > > > If I search on "Woody Allen Zelig" I get 6 records back. The first > > record is the one I'm seeing the problem. > > > > The Solr record has the following fields in it: > > > > [title_full] => Zelig [videorecording] / Metro Goldwyn Mayer ; Orion > > Pictures and Warner Bros. present ; written and directed by Woody Allen. > > [_highlighting] => Array > > [title_full] => Array > > ( > > [0] => and directed by > > {{{{START_HILITE}}}}Woody{{{{END_HILITE}}}} > > {{{{START_HILITE}}}}Allen{{{{END_HILITE}}}}. > > ) > > > > > > If I search on "Woody Allen Zelig Metro Mayer" I get 1 record back. It > > has the following fields in it: > > > > [title_full] => Zelig [videorecording] / Metro Goldwyn Mayer ; Orion > > Pictures and Warner Bros. present ; written and directed by Woody Allen. > > [_highlighting] => Array > > [title_full] => Array > > ( > > [0] => {{{{START_HILITE}}}}Zelig{{{{END_HILITE}}}} > > [videorecording] / {{{{START_HILITE}}}}Metro{{{{END_HILITE}}}} Goldwyn > > {{{{START_HILITE}}}}Mayer{{{{END_HILITE}}}} ; Orion Pictures and Warner > > Bros. present ; written > > ) > > > > Now I can understand the second record. It was trimmed at 100 characters > > because of the: > > > > <int name="hl.fragsize">100</int> > > > > setting in solrconfig.xml. > > > > I cannot understand why the first search result only returns part of the > > title field. Does it have something to do with the semi-colons? But the > > regex part of highlighter is: > > > > <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str> > > > > no semi-colon there. > > > > Any insight to this from anyone? I did some Googling without any luck. > > > > thanks -- al > > -- > > Alan Rykhus > > PALS, A Program of the Minnesota State Colleges and Universities > > (507)389-1975 > > ala...@mn... > > "It's hard to lead a cavalry charge if you think you look funny on a > > horse" ~ Adlai Stevenson > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Vufind-tech mailing list > > Vuf...@li... > > https://lists.sourceforge.net/lists/listinfo/vufind-tech -- Alan Rykhus PALS, A Program of the Minnesota State Colleges and Universities (507)389-1975 ala...@mn... "It's hard to lead a cavalry charge if you think you look funny on a horse" ~ Adlai Stevenson |