|
From: stack <st...@ar...> - 2005-11-03 19:14:25
|
Kristinn Sigurdsson wrote: >Looks like you managed to fix the problem on your end. > >The issue of 0/0 versions is tied to the incorrect coding of characters in the XML (namely & being double escaped to &amp; rather then just &). This causes any URIs that contain & (or other special characters like < and >) to show up as 0/0 versions. > >Any idea what fixed the problem? > > Looks like old version of the nutchwax WAR file with an earlier version of OpenSearchServlet fixed the problem. Trying to get to the bottom of it. Will update the list when leanr more (Looking like need point release of nutchwax). St.Ack >- Kris > > > >>-----Original Message----- >>From: arc...@li... >>[mailto:arc...@li...] >>On Behalf Of Lukáš Matìjka >>Sent: 3. nóvember 2005 08:30 >>To: Sve...@nb... >>Cc: arc...@li... >>Subject: RE: [Archive-access-discuss] wera results >> >> >> >> >>______________________________________________________________ >> >> >>>Od: Sve...@nb... >>>Komu: mat...@ce... >>>CC: >>>Datum: 02.11.2005 19:41 >>>Předmět: RE: [Archive-access-discuss] wera results >>> >>>I tried the latest opensearch servlet myself. It messed up >>> >>> >>my Wera, lots >> >> >>>of 0/0 ... >>> >>>;-) >>> >>> >>now, i'm using what you send to me...and everything seems fine... >>i can't find any 0/0 :) >> >>i will test it more:) >> >>-lm >> >> >> >>>Sverre >>> >>> >>>-----Original Message----- >>>From: Lukás Matejka [mailto:mat...@ce...] >>>Sent: Wed 11/2/2005 4:43 PM >>>To: Sverre Bang >>>Cc: arc...@li... >>>Subject: RE: [Archive-access-discuss] wera results >>> >>> >>> >>>______________________________________________________________ >>> >>> >>>>Od: sve...@nb... >>>>Komu: arc...@li... >>>>CC: >>>>Datum: 02.11.2005 14:33 >>>>Predmet: RE: [Archive-access-discuss] wera results >>>> >>>>Hi there, >>>>Definitely something wrong in NutchWax. If i execute >>>> >>>> >>>> >>http://war.mzk.cz/~nwa/wera/wera/index.php?query=kniha&year_fr >>om=&year_to= >> >> >>>>and click the tmeline link of the first hit showing 0/0 hits i get >>>> >>>> >>>where did you find hit showing 0/0? >>>it works fine for me(i've just explored 150 urls..and no 0/0 hits ) >>>did you remeber number of total hits?(if it's same - i >>> >>> >>experimented with >> >> >>>previous version of nutchwax,starting tomcat on various instances) >>> >>>i had for word "kniha" >>>Total number of versions found : 49087. Displaying URL's 1-10 >>> >>>-lm >>> >>> >>> >>>>'Sorry, no documents with the given uri were found'. The >>>> >>>> >>url displyed >> >> >>>>seems fine, but if you look in the source of the >>>> >>>> >>uppermost frame you >> >> >>>>will see that the url sent to the script was >>>>http://full.nkp.cz/nkdb/rejstriky/rejstrik.asp?irj=12&start=V. >>>>The & separating the parameters irj and start has been >>>> >>>> >>replaced by its >> >> >>>>html character entity reference. >>>> >>>>If i press the go button now the url submitted to the >>>> >>>> >>script will be ok. >> >> >>>>If i look in the NutchWax result set of the initial >>>> >>>> >>search (add &debug=1 >> >> >>>>to the search url to bring out the NutchWax search urls) >>>> >>>> >>i see that the >> >> >>>>url (link element) returned is wrong already here. >>>> >>>>Conclusion : NutchWax mangles the url returned by introducing html >>>>entities instead of keeping the url in its original form. >>>> >>>>What version of NutchWax are you using? >>>> >>>>Sverre >>>> >>>>On Wed, 2005-11-02 at 12:41 +0000, Kristinn Sigurdsson wrote: >>>> >>>> >>>>>This looks like the same (or very similar) problem as >>>>> >>>>> >>I've got. I've >> >> >>>>been discussing it (offlist) with Stack and Sverre Bang, >>>> >>>> >>so I know it is >> >> >>>>being looked into. >>>> >>>> >>>>>I notice in your search results (as in mine) that URIs >>>>> >>>>> >>with & in them >> >> >>>>are showing up as 0/0 versions. I believe that both >>>> >>>> >>problems are due to >> >> >>>>the escaping (or unescaping) of HTML characters in the >>>> >>>> >>NutchWAX XML that >> >> >>>>is used to pass the results to WERA. >>>> >>>> >>>>>Possibly this is a misconfiguration of either Tomcat or >>>>> >>>>> >>Apache...? >> >> >>>>>- Kris >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: arc...@li... >>>>>>[mailto:arc...@li...] >>>>>>On Behalf Of LukAALA MatAZjka >>>>>>Sent: 2. nAlvember 2005 11:21 >>>>>>To: arc...@li... >>>>>>Subject: [Archive-access-discuss] wera results >>>>>> >>>>>> >>>>>>Hi, >>>>>> >>>>>>for example >>>>>>http://war.mzk.cz/~nwa/wera/wera/index.php?query=kniha&year_fr >>>>>> >>>>>> >>>>>om=&year_to= >>>>> >>>>>description of each record is not well-displayed >>>>> >>>>>1. SKIP, Moje kniha (http://skip.nkp.cz/akcMojekn.htm) >>>>>(<b> ... </b>prístupu k internetu v knihovnách >>>>> >>>>> >>>>propagovat vyuzití internetu pri >>>>zjistování názoru obyvatel 2. Anketa >>>>Pomocí krátké ankety bude zjistována >>>>nejoblíbenejsí <b>kniha</b> obyvatel >>>>Ceské republiky. Pojem nejoblíbenejsí >>>><b>kniha</b> je specifikován dalsími výklady, >>>>jako "<b>kniha</b>, která me nejvíce >>>>ovlivnila", "<b>kniha</b>, ke které se casto >>>>vracím", "<b>kniha</b>, kterou bych doporucil/a >>>>dobrým prátelum", "<b>kniha</b>, >>>>která zmenila muj zivot", "<b>kniha</b> na >>>>kterou nemohu zapomenout", "<b>kniha</b>, která mne uvedla >>>>do jiného sveta", "<b>kniha</b>, kterou bych si s >>>>sebou vzal/a jako jedinou<b> ... </b>) >>>> >>>> >>>>>Versions (matching query/total) 3/3 >>>>>Timeline | Overview >>>>> >>>>>"prístupu" should be "pLAstupu"(without diacritics >>>>> >>>>> >>>>"pristupu") >>>> >>>> >>>>>does anybody have same problem? >>>>> >>>>>-lm >>>>> >>>>> >>>>> >>>>>------------------------------------------------------- >>>>>SF.Net email is sponsored by: >>>>>Tame your development challenges with Apache's Geronimo >>>>> >>>>> >>App Server. >> >> >>>>Download >>>> >>>> >>>>>it for free - -and be entered to win a 42" plasma tv or >>>>> >>>>> >>your very own >> >> >>>>>Sony(tm)PSP. Click here to play: >>>>> >>>>> >>http://sourceforge.net/geronimo.php >> >> >>_______________________________________________ >> >> >>>>>Archive-access-discuss mailing list >>>>>Arc...@li... >>>>> >>>>> >>>>> >>https://lists.sourceforge.net/lists/listinfo/archive-access-di >> >> >scuss > > >>>> >>>>------------------------------------------------------- >>>>SF.Net email is sponsored by: >>>>Tame your development challenges with Apache's Geronimo App Server. >>>> >>>> >>>Download >>> >>> >>>>it for free - -and be entered to win a 42" plasma tv or your very own >>>>Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >>>>_______________________________________________ >>>>Archive-access-discuss mailing list >>>>Arc...@li... >>>>https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >>>> >>>> >>>------------------------------------------------------- >>>SF.Net email is sponsored by: >>>Tame your development challenges with Apache's Geronimo App Server. >>>Download >>>it for free - -and be entered to win a 42" plasma tv or your very own >>>Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >>>_______________________________________________ >>>Archive-access-discuss mailing list >>>Arc...@li... >>>https://lists.sourceforge.net/lists/listinfo/archive-access-discuss >>> >>> >>> >> >> >> >> > > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |