Re: [htmltmpl] RFC: Template Tag Attributes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

At 09:38 AM 6/3/04 +1000, Mathew Robertson wrote:
> > > > Inevitably, there will be certain pages using TMPL_INCLUDE tags.  I 
> imagine
> > > > that most of these will contain data that will not want to be 
> searched for,
> > > > such as footers, and therefore my filter program can simply ignore
> > > > them.  However, I don't feel safe in making the blanket assumption that
> > > > /all/ included files don't need to be searchable.
> > >
> > >Now you've lost me.  There's lots of stuff in an HTML page that
> > >shouldn't be searched for.  Stuff like headers and footers in includes
> > >is just the tip of the ice-berg.  Why obsess over this?
> >
> > I want to give content authors more control over what portions of a
> > document are searched for.
>
>This is a common mistake that information creators think 'is a good 
>thing'...  The web got popular for a number of reasons - one of them being 
>"full text indexing of all content" (including headers/footers/etc).

Why?  There is no useful information in headers/footers.  By nature of 
using a templating system, they are the same on every page in a given 
section.  Including them in search results only increases the noise and the 
amount of information that needs to be indexed.

>The point is that, it is the user of the system that wants to find the 
>information - not the author telling you what you can and cant search 
>for.   Classic example -> books used to have (and still do) an index in 
>the last couple of pages of the book, yet the user could never find what 
>they were looking for; until the book made it onto CDROM at which point 
>full-text-searching was possible.

Almost every piece of a book is useful to search.  But what good would it 
be to search for a chapter heading?  That information is already given to 
you in the table of contents.

>-> Full text searching is a _much better_ solution to search problems than 
>indexing on what YOU think is the information they want.
>
>Mathew
>
>PS.  This means, use a spider.. or even better use google via a "site:..." 
>search.

Google PageRank is very good at searching a broad sample of sites.  It's 
not so good for individual sites.