|
From: James G. <jg...@si...> - 2006-11-17 23:07:40
|
> 1) Images *usually* don't seem to be displayed. Aha, the image thing was my fault; seems the images I was missing were on a different domain (and I had restricted my crawl to the single domain). I'll have to do another test, but I think that's explained. jamesG James Grahn wrote: > A few quick comments: > > 1) Images *usually* don't seem to be displayed. Though I saved images > in one of the ARC files I'm using, they do not appear on the page in > WERA. I've also noticed this occurring on the WERA test site > http://nwa.nb.no/wera/ when I search for "library" and examine the front > page of the library of congress. > > Behavior: The image will appear, only to be replaced by the image's > "alt" tag as the page has its links remapped. > > Expected behavior: The image should reappear after the links are > remapped (because the image should be in the ARC). > > 2) There are some webpages that throw off the formatting of WERA. They > seem to be primarily textareas with html embedded. > > When indexed, they sometimes throw off the table formatting of WERA and > sometimes cause input boxes and submit buttons to appear on the search page. > > Always-valid examples: > http://cl.cnn.com/ctxtlink/jsp/cnn/cl/1.5/cnn-story-cl.jsp > http://sportsillustrated.cnn.com/.element/ssi/misc/2.0/contextual/story.html > http://www.cnn.com/.element/ssi/sect/1.3/WEATHER/weatherPageBox.html > http://www.cnn.com/WEATHER/ > http://cnn.dyn.cnn.com/intlWeatherBox.html > > > Perhaps-not-always-valid examples: > http://www.cnn.com/.element/ssi/www/breaking_news/1.1/banner.exclude.html > > An example of such offending html: > <textarea name="breakingNews"><!--breaking news banner--> > <div id="cnnBNBBreakingNews"> > <table cellpadding="0" cellspacing="0" border="0"> > <tr valign="middle"> > <td width="181" valign="top"><img > src="http://i.a.cnn.net/cnn/.element/img/1.5/ceiling/bnb/breaking_news.gif" > alt="" width="181" height="47" hspace="0" vspace="0" border="0"></td> > <td class="right"><div id="cnnNarrowBulletinText">Britney Spears > files for divorce from her husband Kevin Federline, citing > irreconcilable differences. </div></td> > </tr> > </table> > </div> > <!--/breaking news banner--> > </textarea> > > > 3) This xml file resulted in an abrupt end of a table in WERA: > http://edition.cnn.com/.element/img/1.3/swf/pipeline_mainpage/config_intl.xml > > The source for this was a crawl of CNN at a depth of 2 links. > Hopefully the examples are revealing. > > jamesG > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |