From: Tod T. <tt...@ch...> - 2002-01-28 15:37:29
|
Can someone please set the record straight on this? Everything I have come across says that web spiders cannot index javascript since they cannot parse and interpret it. My impression is this is particularly true for sites that use JavaScript almost exclusively for things like navigation, drop down menus, and the like. Is this true, or is a solution in place that I don't know about? Can anybody point me to documentation that discusses Javascript and how search indexing is affected? Are there commercial solutions available? Thanks - Tod |
From: Neal R. <ne...@ri...> - 2002-01-29 02:40:55
|
On Mon, 28 Jan 2002, Tod Thomas wrote: > Can someone please set the record straight on this? > > Everything I have come across says that web spiders cannot index javascript since they cannot parse and interpret > it. My impression is this is particularly true for sites that use JavaScript almost exclusively for things like > navigation, drop down menus, and the like. > > Is this true, or is a solution in place that I don't know about? Can anybody point me to documentation that > discusses Javascript and how search indexing is affected? Are there commercial solutions available? That would depend on what exactly you are doing. As far as the javascript code goes, that won't get indexed as javascript code is encapsulated in HTML comment tags... assuming the web-spider's HTML parsing code isn't brain-dead. If you are talking about a fully dynamic page with lots of parameters and dynamic functionality it becomes very hard. The spider would need to have lots of web-browser functionality incorporated into it to load up a javascript environment and treat the page as a program-to-run, rather than a file-to-parse. The spider would have to understand http GET/POST params how those parameters influenced the page. This is the DHTML analog of spidering classical CGI apps, most spiders can't 'go through' the forms. There was a company recently that actually got some momentary notoriety by announcing that it had developed spidering technology that would fill in web forms and submit them to spider the pages 'behind' the forms. Javascript dependent navigation also would need a kind of browser-like spider that treats the page as a program-to-run. Dynamic PHP pages are a bit easier, especially if all the parameters are on the URL and no forms are used. A spider sees them as standard URLs and follows them, assuming the spider's URL parser doesn't kill of the parameters before it fetches the page. Of course nothing stops you from using PHP to supply javascript menus... using PHP won't help spidering for that situation. Summary: CGI with forms: NO (for the most part) DHTML with Javascript: Depends on the level of usage, the more dynamic (dependent on parameters and javascript code output) the page is, the less successful spidering will be. PHP (or other server-side scripting lang) w/o hidden parameters: Spiders can do pretty well here. The more user-side interactive the page is, generally the worse off you will be. Of course a lot of these drawbacks can be somewhat compensated for by designing a web-site/web-app with spidering in mind. That involves lots of testing and good default behavior.. a lot like making your pages visible/usable on now ancient web-browsers like Mozilla 1.0, early versions of Netscape, etc. After reading the Retriever & HTML parsing code, htdig pretty much treats web pages as documents-to-parse, and not programs-to-run. So without good default behavior, it may not be too sucessfull on pages with dynamic content for navigation purposes. Feel free to correct me if I missed something. ;-) Thanks. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |