From: Emma J. H. <emm...@xt...> - 2002-03-08 04:28:43
|
Thanks for answering, Geoff!! >[-vvv output is] intended to be self-explanatory. Which is unfortunate because there aren't any errors that I can find in the output, and the page itself is "fine" ... I've even tried setting this URL as the only start_url (I posted earlier about potential problems with line wrapping being the reason for missing content on pages). But even with only this questionable URL as the only start_url none of the content on the page gets indexed, and none of the links get followed. >>Authorization: Basic xxxxxxx > >You probably didn't want to post that to a mailing list. It's encrypted, >but not particularly rigorously. You're right I probably shouldn't have posted it. It's a fairly easy u:p to guess as the site doesn't need to be super secret, just a little bit secret. ... >>title: College Apprenticeship Programs: CareerMATTERS >><snipped a bunch of images> > >OK, but what exactly is on the page? It certainly didn't find anything >significant to index or links other than the images you pointed out. The page has four bread crumb items, a bunch of image navigation buttons, eight left nav text links, and over 20 text links (in a list). None of the words on the page are getting put into the word db. i.e. the page has a list of Colleges and none of the names of the colleges show up when I do a search. >Either the HTML parser is missing a lot, or there isn't much on the page >to index. I think it's the first option, which scares me. :( emma |