html-template-users Mailing List for HTML::Template (Page 47)
Brought to you by:
samtregar
You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(42) |
Jul
(80) |
Aug
(77) |
Sep
(97) |
Oct
(65) |
Nov
(80) |
Dec
(39) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(63) |
Feb
(47) |
Mar
(45) |
Apr
(63) |
May
(67) |
Jun
(51) |
Jul
(78) |
Aug
(37) |
Sep
(45) |
Oct
(59) |
Nov
(50) |
Dec
(70) |
2004 |
Jan
(23) |
Feb
(90) |
Mar
(37) |
Apr
(53) |
May
(111) |
Jun
(71) |
Jul
(35) |
Aug
(58) |
Sep
(35) |
Oct
(35) |
Nov
(35) |
Dec
(20) |
2005 |
Jan
(51) |
Feb
(19) |
Mar
(20) |
Apr
(8) |
May
(26) |
Jun
(14) |
Jul
(49) |
Aug
(24) |
Sep
(20) |
Oct
(49) |
Nov
(17) |
Dec
(53) |
2006 |
Jan
(12) |
Feb
(26) |
Mar
(45) |
Apr
(19) |
May
(19) |
Jun
(13) |
Jul
(11) |
Aug
(9) |
Sep
(10) |
Oct
(16) |
Nov
(17) |
Dec
(13) |
2007 |
Jan
(9) |
Feb
(12) |
Mar
(28) |
Apr
(33) |
May
(12) |
Jun
(12) |
Jul
(19) |
Aug
(4) |
Sep
(4) |
Oct
(5) |
Nov
(5) |
Dec
(13) |
2008 |
Jan
(6) |
Feb
(7) |
Mar
(14) |
Apr
(16) |
May
(3) |
Jun
(1) |
Jul
(12) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(9) |
2009 |
Jan
(9) |
Feb
|
Mar
(10) |
Apr
(1) |
May
|
Jun
(6) |
Jul
(5) |
Aug
(3) |
Sep
(7) |
Oct
(1) |
Nov
(15) |
Dec
(1) |
2010 |
Jan
|
Feb
|
Mar
|
Apr
(9) |
May
|
Jun
|
Jul
(5) |
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
(3) |
Mar
|
Apr
(28) |
May
|
Jun
|
Jul
(3) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
(8) |
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2016 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Sam T. <sa...@tr...> - 2004-06-17 19:18:20
|
Hello all. I'm sure it will come as a great shock, but I'm planning to make a new release of HTML::Template sometime next week. Here is what I have so far: - Bug Fix: Improved cache keying to be sensitive to options which alter the compilation of templates (path, search_path, loop_context_vars and global_vars). Calls to new() with different settings for any of these options will no longer pull incorrect cached objects. - Bug Fix: Added code to detect broken Perl 5.8.0 utf-8 during installation (i.e. Redhat 8 and 9). - Bug Fix: Fixed parsing of ESCAPE='URL' (Paul Baker) It's that first one that's stimulating this release. However, I know a number of you have pending bugs and (shudder) new features that you'd like to see addressed in the next release. Now is the time to get them in for 2.7. If you're submitting a bug report please include a complete test case that demonstrates the problem. For extra points make it a patch to test.pl in CVS. If you're submitting a patch, please make it against CVS if you can. If not then please make sure it was made against the 2.6 release. Thanks! -sam |
From: Sam T. <sa...@tr...> - 2004-06-09 19:04:39
|
I do see one potential problem: > for (my $x = 0; $x < $rows; $x++) { > push(@return_array, $data_record[$offset + $x] ); That will push on 'undef' values on your last page. Maybe change that to: push(@return_array, $data_record[$offset + $x] ) if exists $data_record[$offset + $x]; Another idea would be to add this just before you return from your callback: use Data::Dumper; print STDERR Dumper(\@return_array); If everything is working you should see an array-ref containing a single hash in your error logs when you view the last page. -sam |
From: Greg J. <gr...@al...> - 2004-06-09 18:54:34
|
On Wednesday June 9 2004 10:12 am, you wrote: > On Wed, 9 Jun 2004, Greg Jetter wrote: > > get_data_callback returned something that isn't an array ref! You must > > return from get_data_callback in the format [ [ $col1, $col2], [ $col1, > > $col2] ] or [ { NAME => value ... }, { NAME => value ...} ]. at > > /usr/lib/perl5/site_perl/5.8.0/HTML/Pager.pm line 421. > > > > any thoughts as to why ? > > My guess would be that get_data_callback returned something that isn't > an array ref. ;) > > > could it have to do with the last set of data not containing full > > set of rows ? > > If that triggers a bug in your code, sure! > > > I can post code if it would make things clearer .. > > Please do. But keep it short! > > -sam here's the code I'm using to call the paging stuff the array has 81 items , the first 80 display correctly , when page 9 , which should contain the last item is called the error is generated. @data_record contains 81 member array of hashs refs. # snip my $rows = @data_record; # get the number of rows my @array =("Class", "Price","Street_Name","Bedrooms","Bath","Size","Garage_Size","Lot_Size","Special_Features", "View","city","state","county" ); my $array = \@array ; my $get_data_sub = sub { my ($offset, $rows) = @_; my @return_array; for (my $x = 0; $x < $rows; $x++) { push(@return_array, $data_record[$offset + $x] ); } return \@return_array; } ; # create a Pager object my $pager = HTML::Pager->new( # required parameters query => $q, template => $tmpl, get_data_callback => $get_data_sub, rows => $rows, page_size => 10 , persist_vars => [@$array ] ); # make it go - send the results to the browser. print $q->header(); print $pager->output; Any help would be nice , what puzzles me is why it would work for the first 80 records and then through an error. I also tried it on several diffrent record sets , one with less that 50 records and one with more than 300 , in each case the last page acts the same , the pageing works fine untill the last page is called then it fails with the " get_data_callback returned something that isn't an array ref! " message Help Greg |
From: Sam T. <sa...@tr...> - 2004-06-09 18:12:32
|
On Wed, 9 Jun 2004, Greg Jetter wrote: > get_data_callback returned something that isn't an array ref! You must return > from get_data_callback in the format [ [ $col1, $col2], [ $col1, $col2] ] or > [ { NAME => value ... }, { NAME => value ...} ]. at > /usr/lib/perl5/site_perl/5.8.0/HTML/Pager.pm line 421. > > any thoughts as to why ? My guess would be that get_data_callback returned something that isn't an array ref. ;) > could it have to do with the last set of data not containing full > set of rows ? If that triggers a bug in your code, sure! > I can post code if it would make things clearer .. Please do. But keep it short! -sam |
From: Greg J. <gr...@al...> - 2004-06-09 17:50:47
|
On Tuesday June 8 2004 8:27 pm, Sam Tregar wrote: > On Tue, 8 Jun 2004, Greg Jetter wrote: > > Could someone enlighten me on the proper way to pass an array ref to > > this part of the new() call ? > > > > I've tried persist_vars => [@$array ] > > > > WHERE : > > > > @array = ("PostalCode", $searchZip); > > That should be just: > > @array = ("PostalCode"); > > HTML::Pager uses the value from $query->param() for the vars you ask > to be persisted. > > -sam > > > ------------------------------------------------------- > This SF.Net email is sponsored by: GNOME Foundation > Hackers Unite! GUADEC: The world's #1 Open Source Desktop Event. > GNOME Users and Developers European Conference, 28-30th June in Norway > http://2004/guadec.org > _______________________________________________ > Html-template-users mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/html-template-users Thanks that did the trick , I was not aware that it acted in that fasion , I should have gussed that it had to get the values from some place. and the logical place was from the query params . It's working now up to the last page of data , in my case the script makes up 9 pages , pages 1 thr 8 display and act correctly page 9 produces the following error: get_data_callback returned something that isn't an array ref! You must return from get_data_callback in the format [ [ $col1, $col2], [ $col1, $col2] ] or [ { NAME => value ... }, { NAME => value ...} ]. at /usr/lib/perl5/site_perl/5.8.0/HTML/Pager.pm line 421. any thoughts as to why ? could it have to do with the last set of data not containing full set of rows ? in this paticular case I know there are only 81 hashs in my array of hashes for the loop . So page nine would have only 1 , and the page size is set to 10 ... I can post code if it would make things clearer .. thanks Greg |
From: Alex K. <ka...@ra...> - 2004-06-09 12:20:06
|
There's a new test for the case too. --- JIT/Base.pm Wed Jun 9 15:31:15 2004 +++ JIT/Base.pm Wed Jun 9 15:34:32 2004 @@ -75,7 +75,7 @@ foreach my $row (@$array) { croak("Bad param settings - found non hash-ref for loop row in loop $loop_name!") - unless ref $row eq 'HASH'; + unless ref $row && UNIVERSAL::isa($row, 'HASH'); my $lc_name; foreach my $name (keys %$row) { --- t/03loops.t Wed Jun 9 15:57:13 2004 +++ t/03loops.t Wed Jun 9 16:00:15 2004 @@ -1,4 +1,4 @@ -use Test::More tests => 8; +use Test::More tests => 9; use HTML::Template::JIT; my $debug = 0; @@ -69,3 +69,13 @@ like($output, qr/Apples, Oranges, Brains, Toes, and Kiwi./); like($output, qr/pingpongpingpongpingpong/); +$template = HTML::Template::JIT->new(filename => 'loop.tmpl', + path => ['t/templates'], + jit_path => 't/jit_path', + jit_debug => $debug, + ); +$template->param(foo => "FOO"); +$template->param(bar => [ bless({ val => 'foo' }, 'barfoo'), + bless({ val => 'bar' }, 'barbar') ]); +$output = $template->output(); +like($output, qr/bar: foo,bar,/); -- Alex Kapranoff. |
From: Sam T. <sa...@tr...> - 2004-06-09 04:27:33
|
On Tue, 8 Jun 2004, Greg Jetter wrote: > Could someone enlighten me on the proper way to pass an array ref to this > part of the new() call ? > > I've tried persist_vars => [@$array ] > > WHERE : > > @array = ("PostalCode", $searchZip); That should be just: @array = ("PostalCode"); HTML::Pager uses the value from $query->param() for the vars you ask to be persisted. -sam |
From: Greg J. <gr...@al...> - 2004-06-09 03:35:26
|
Could someone enlighten me on the proper way to pass an array ref to this part of the new() call ? I've tried persist_vars => [@$array ] WHERE : @array = ("PostalCode", $searchZip); $array = \@array; $SearchZip = 99654; it works but I get the following printed to the template: <input type="hidden" name="PostalCode" value="99654" /> <input type="hidden" name="99654" value="" /> as you can see the second part of the array is repeated . I've tried variations , each time an additional line is produced duplicating the second item in the array or just produces a string value or fails I've tried persist_vars => [@array ] I've tried persist_vars => [$array ] even passing the value directly like: persist_vars => [ 'PostalCode', '$searchZip'] He I'm not claiming to be a perl expert , I don't work with refrences as much as I should so please be easy on me : ) i would like to be able to pass up to 14 different name value pairs to produce 14 different hidden input fields , I'm sure I must be passing the array ref wrong , but Evey thing i try produces bad results. Can some one give me a clue ? thanks for any help Greg |
From: Taylor B. <tba...@he...> - 2004-06-08 18:02:45
|
Howdy, I'm a new HTML::Template and didn't find anything about this in the docs, perhaps I overlooked it, but I'm pretty sure that the functionality isn't there. I'd like to be able to nest Template objects so it's a little easier to manage the dynamics of the templates used. I wrote a quick module that would tie a HTML::Template object to my class and when it is FETCH'ed, it calls the output() method. This could be more easily done with the following alteration to HTML::Template: #current lines 2608-2610 elsif ($type eq 'HTML::Template::VAR') { defined($$line) and $result .= $$line; } #suggested change for line 2609 #this will check to see if the current line is a HTML::Template object if(defined $$line) { $result .= (ref $$line eq __PACKAGE__) ? $$line->output(): $$line #UNIVERSAL::isa($$line,__PACKAGE__) could be used, too } Is it possible to get this change in the next version of HTML::Template? :) -- Taylor Basilio |
From: Sam T. <sa...@tr...> - 2004-06-07 17:49:48
|
Krang v1.019 is now available. Notable changes in this release: - Upgrades can now be re-run even if they failed part-way through previously. Several bugs are now fixed which should make failure less likely than before. - Krang::DataSet, and as a result krang_import, is now much faster when loading large KDS files. - Krang::Site no longer requires a site's preview URL and publishing paths to be unique. Only the publish URL is required to be unique now. - Many bugs are now fixed. Detailed change-log here: http://krang.sf.net/docs/changelog.html Krang is an Open Source web-publisher / content-management system designed for large-scale magazine-style websites. It is a 100% Perl application using Apache/mod_perl and MySQL, as well as numerous CPAN modules. Krang provides a powerful and easy to use story and media editing environment for magazine editors, as well as a complete template development environment for web designers. On the back-end, Perl programmers can customize Krang to control the data entered in the story editor and add code to drive the templates to build output. Krang can be enhanced with add-ons containing new skins and other new features. Krang easily handles large data sets and can manage multiple websites in a single installation. For more information about Krang, visit the Krang website: http://krang.sourceforge.net/ There you can download Krang, view screenshots, read documentation, join our mailing-lists and access the CVS tree. - the Krang team |
From: Mathew R. <mat...@re...> - 2004-06-07 00:24:19
|
> > > Google PageRank is very good at searching a broad sample of sites. = It's > > > not so good for individual sites. > > > >You are kidding right? The algorithms the Google use, already take = into=20 > >account common content on multiple pages. OpenOffice.org use Google = as=20 > >their own site specific search engine. As do a number of sites. The = only=20 > >real problem with using Google is that they only spider the web every = few=20 > >weeks, thus if you update more frequently than that, you may have a = problem. >=20 > No, I'm not kidding. It's not common content that Google has a = problem=20 > which makes it less than perfect for individual sites. The whole = point=20 > behind PageRank is that you get a lot of sites linking to each other = and=20 > are given rank based on the link count. On a smaller sampling, there=20 > simply isn't enough data to be fed into PageRank to make a good rank. = Is=20 > it still useful? Sure it is, but it's also not as good as other = solutions=20 > for this particular problem. PageRank is not the only ranking algorithm in use during Google searches = -> this is particularily the case when a 'site:...' search is done. As with most other search engines, you can submit your site/url to = google so that it is spidered then next time google updates its database = -> you dont _need_ to be linked to, to get ranked (although it helps). Mathew |
From: Timm M. <tm...@ag...> - 2004-06-04 12:31:45
|
At 09:02 AM 6/4/04 +1000, Mathew Robertson wrote: <> > > Google PageRank is very good at searching a broad sample of sites. It's > > not so good for individual sites. > >You are kidding right? The algorithms the Google use, already take into >account common content on multiple pages. OpenOffice.org use Google as >their own site specific search engine. As do a number of sites. The only >real problem with using Google is that they only spider the web every few >weeks, thus if you update more frequently than that, you may have a problem. No, I'm not kidding. It's not common content that Google has a problem which makes it less than perfect for individual sites. The whole point behind PageRank is that you get a lot of sites linking to each other and are given rank based on the link count. On a smaller sampling, there simply isn't enough data to be fed into PageRank to make a good rank. Is it still useful? Sure it is, but it's also not as good as other solutions for this particular problem. |
From: Pete P. <pet...@cy...> - 2004-06-04 12:30:21
|
Bill Moseley wrote: > On Thu, Jun 03, 2004 at 12:33:18PM -0500, Pete Prodoehl wrote: > >>I've seen sites where I could read a word on a page, input it into the >>'site search' box, and get no results. This tells me that the word does >>not exist on the site (even though I can see it) or more likely that >>this is not really a 'site search' but perhaps a 'content search' or >>'article search' or whatever... > > Of course, if the word is on every page on the site (a global header or > footer) the word is useless as a search term for helping someone find > something on the site. Unless they are looking for everything. ;) Well, with properly weighing, titles, <h1>, etc. being of more importance, this shouldn't be the case... > There's also the rare chance that indexing the common text makes it > harder to find content about something specific -- when that word might > also be in the header/footer. Ranking should take care of that, though. Yup... ;) > Excluding menus is also slightly helpful when common menu items might end up > as search terms. Yup... > But, I would agree that in most cases it's better to index the content, > and let the user worry about using good queries. Anyway, how much > common content do you want on every page on the site? How much do *I* want on every page, or how much to the "powers that be" want on every page? ;) Pete |
From: Mathew R. <mat...@re...> - 2004-06-03 23:03:17
|
> >This is a common mistake that information creators think 'is a good=20 > >thing'... The web got popular for a number of reasons - one of them = being=20 > >"full text indexing of all content" (including headers/footers/etc). >=20 > Why? There is no useful information in headers/footers. By nature of = > using a templating system, they are the same on every page in a given=20 > section. Including them in search results only increases the noise = and the=20 > amount of information that needs to be indexed. Only says you - a user of your site may find the headers and footers to = be very useful. > >The point is that, it is the user of the system that wants to find = the=20 > >information - not the author telling you what you can and cant search = > >for. Classic example -> books used to have (and still do) an index = in=20 > >the last couple of pages of the book, yet the user could never find = what=20 > >they were looking for; until the book made it onto CDROM at which = point=20 > >full-text-searching was possible. >=20 > Almost every piece of a book is useful to search. But what good would = it=20 > be to search for a chapter heading? That information is already given = to=20 > you in the table of contents. An index is a whole different beast to a table of contents. And as you = say, every peice of a book is useful to search -> thus why bother only = indexing content that the author considers valid? Why not just index = _everything_, then let the user decide... > >-> Full text searching is a _much better_ solution to search problems = than=20 > >indexing on what YOU think is the information they want. > > > >Mathew > > > >PS. This means, use a spider.. or even better use google via a = "site:..."=20 > >search. >=20 > Google PageRank is very good at searching a broad sample of sites. = It's=20 > not so good for individual sites. You are kidding right? The algorithms the Google use, already take into = account common content on multiple pages. OpenOffice.org use Google as = their own site specific search engine. As do a number of sites. The = only real problem with using Google is that they only spider the web = every few weeks, thus if you update more frequently than that, you may = have a problem. Mathew |
From: Bill M. <mo...@ha...> - 2004-06-03 21:23:14
|
On Thu, Jun 03, 2004 at 12:33:18PM -0500, Pete Prodoehl wrote: > I've seen sites where I could read a word on a page, input it into the > 'site search' box, and get no results. This tells me that the word does > not exist on the site (even though I can see it) or more likely that > this is not really a 'site search' but perhaps a 'content search' or > 'article search' or whatever... Of course, if the word is on every page on the site (a global header or footer) the word is useless as a search term for helping someone find something on the site. Unless they are looking for everything. ;) There's also the rare chance that indexing the common text makes it harder to find content about something specific -- when that word might also be in the header/footer. Ranking should take care of that, though. Excluding menus is also slightly helpful when common menu items might end up as search terms. But, I would agree that in most cases it's better to index the content, and let the user worry about using good queries. Anyway, how much common content do you want on every page on the site? -- Bill Moseley mo...@ha... |
From: Timm M. <tm...@ag...> - 2004-06-03 19:43:46
|
At 11:30 AM 6/3/04 -0700, Mark Fuller wrote: <> >A good indexer would add weight to words that appear in headers/footers >which also appear in title, meta keyword/description, headings, and >expository content (paragraphs, list items, data terms and definitions). >Conversely, reduce weight for words appearing in headers/footers (or any >part of the document) that doesn't appear in outline/structural elements. >What I really hate is all words presented to me without any weighting of >relevance to the topic of the page (determined from its structual elements). <> We're pretty set on using htdig://, since we already use it for a small subsection of our site. We discussed using Google site:, but (as I mentioned in another e-mail), Google's algorithm isn't very good for searching a specific site. What I had planned on doing with Apache::QuickCMS is specify in our site coding standards that the template must provide certain variables for the content. Specifically, title, meta-description, meta-keyword, and content. Each would go in the obvious place in the HTML document. This is specifically to help the search indexer do the Right Thing. |
From: Mark F. <mar...@ea...> - 2004-06-03 18:30:12
|
From: "Pete Prodoehl" <pet...@cy...> > I've seen sites where I could read a word on a page, input it into the > 'site search' box, and get no results. Some search/indexing tools exclude common and site-defined words. I used a really nice local indexing tool called WebGlimpse[1] and it had this ability (as well as fuzzy word matches). > While you say that there is no useful information in headers/footers, > that may be by your definition and decision of what goes in a header and > footer. This can vary from person to person. A good indexer would add weight to words that appear in headers/footers which also appear in title, meta keyword/description, headings, and expository content (paragraphs, list items, data terms and definitions). Conversely, reduce weight for words appearing in headers/footers (or any part of the document) that doesn't appear in outline/structural elements. What I really hate is all words presented to me without any weighting of relevance to the topic of the page (determined from its structual elements). If it were me, I would index the entire site and work on the relevancy algorithem. For example, perhaps the footer should be enclosed in a <div class=footer> and the indexer programmed to omit such classes of divisions. FWIW: I don't think Tim's attempt to get more flexible/enabling features will go very far. I can't even get an attribute indicating that linefeeds only for the readbility of <tmpl> tags not remain when the tags are removed by h::t. It's like a festering wound for me. :) [1] http://webglimpse.net/ Mark |
From: Pete P. <pet...@cy...> - 2004-06-03 17:33:28
|
Timm Murray wrote: > At 09:38 AM 6/3/04 +1000, Mathew Robertson wrote: >> This is a common mistake that information creators think 'is a good >> thing'... The web got popular for a number of reasons - one of them >> being "full text indexing of all content" (including >> headers/footers/etc). > > > Why? There is no useful information in headers/footers. By nature of > using a templating system, they are the same on every page in a given > section. Including them in search results only increases the noise and > the amount of information that needs to be indexed. I've seen sites where I could read a word on a page, input it into the 'site search' box, and get no results. This tells me that the word does not exist on the site (even though I can see it) or more likely that this is not really a 'site search' but perhaps a 'content search' or 'article search' or whatever... I think people are used to using Google, Yahoo, etc. and getting results that come from *every* piece of text on the page, whether that is a good thing or bad thing can depend... it can depend on user expectations, and other things. While you say that there is no useful information in headers/footers, that may be by your definition and decision of what goes in a header and footer. This can vary from person to person. Pete |
From: Timm M. <tm...@ag...> - 2004-06-03 13:30:31
|
At 09:38 AM 6/3/04 +1000, Mathew Robertson wrote: > > > > Inevitably, there will be certain pages using TMPL_INCLUDE tags. I > imagine > > > > that most of these will contain data that will not want to be > searched for, > > > > such as footers, and therefore my filter program can simply ignore > > > > them. However, I don't feel safe in making the blanket assumption that > > > > /all/ included files don't need to be searchable. > > > > > >Now you've lost me. There's lots of stuff in an HTML page that > > >shouldn't be searched for. Stuff like headers and footers in includes > > >is just the tip of the ice-berg. Why obsess over this? > > > > I want to give content authors more control over what portions of a > > document are searched for. > >This is a common mistake that information creators think 'is a good >thing'... The web got popular for a number of reasons - one of them being >"full text indexing of all content" (including headers/footers/etc). Why? There is no useful information in headers/footers. By nature of using a templating system, they are the same on every page in a given section. Including them in search results only increases the noise and the amount of information that needs to be indexed. >The point is that, it is the user of the system that wants to find the >information - not the author telling you what you can and cant search >for. Classic example -> books used to have (and still do) an index in >the last couple of pages of the book, yet the user could never find what >they were looking for; until the book made it onto CDROM at which point >full-text-searching was possible. Almost every piece of a book is useful to search. But what good would it be to search for a chapter heading? That information is already given to you in the table of contents. >-> Full text searching is a _much better_ solution to search problems than >indexing on what YOU think is the information they want. > >Mathew > >PS. This means, use a spider.. or even better use google via a "site:..." >search. Google PageRank is very good at searching a broad sample of sites. It's not so good for individual sites. |
From: Mathew R. <mat...@re...> - 2004-06-02 23:38:56
|
> > > Inevitably, there will be certain pages using TMPL_INCLUDE tags. = I imagine > > > that most of these will contain data that will not want to be = searched for, > > > such as footers, and therefore my filter program can simply ignore > > > them. However, I don't feel safe in making the blanket assumption = that > > > /all/ included files don't need to be searchable. > > > >Now you've lost me. There's lots of stuff in an HTML page that > >shouldn't be searched for. Stuff like headers and footers in = includes > >is just the tip of the ice-berg. Why obsess over this? >=20 > I want to give content authors more control over what portions of a=20 > document are searched for. This is a common mistake that information creators think 'is a good = thing'... The web got popular for a number of reasons - one of them = being "full text indexing of all content" (including = headers/footers/etc). The point is that, it is the user of the system that wants to find the = information - not the author telling you what you can and cant search = for. Classic example -> books used to have (and still do) an index in = the last couple of pages of the book, yet the user could never find what = they were looking for; until the book made it onto CDROM at which point = full-text-searching was possible. -> Full text searching is a _much better_ solution to search problems = than indexing on what YOU think is the information they want. Mathew PS. This means, use a spider.. or even better use google via a = "site:..." search. |
From: Pete P. <pet...@cy...> - 2004-06-02 21:32:54
|
Sam Tregar wrote: > On Wed, 2 Jun 2004, Timm Murray wrote: > >>I have a project which involves changing our current pages using SSI >>#includes into using an HTML::Template-based solution instead. The >>question of how to search these templates came up, and I've suggested using >>htdig:// with a filter program that will convert the templates into text >>for indexing. > I suggest you use a web-spider to index your site, which would of > course run HTML::Template to produce output. That way you index > everything the end-user sees, not just the static parts of the page. > I don't know for sure but I bet htdig:// comes with a web spider. If > not 'wget -r' is handy in a pinch! ht://Dig is good, as is mnoGoSearch... http://www.htdig.org/ http://search.mnogo.ru/ They both spider your site and create an index, I think they both have options to specify parts of your page that should not be indexed... Pete |
From: Timm M. <tm...@ag...> - 2004-06-02 19:19:11
|
At 11:34 AM 6/2/04 -0700, Bill Moseley wrote: >On Wed, Jun 02, 2004 at 12:43:00PM -0500, Timm Murray wrote: > > At 09:45 AM 6/2/04 -0700, Bill Moseley wrote: > > <> > > >> I don't think either solution is particularly difficult to implement, > > >> but scanning the content files directly also lets us have an easier > > >> time analyzing the structure of the document. > > > > > >All the server does is supply the content. Analyzing the content > > >happens after that, regardless of using the server or the file system. > > >Spidering lets you index the content as people see it on their browser. > > > > Take a look at the system being > > used: http://www.perlmonks.org/index.pl?node_id=357248, particularly the > > 'Documents' subsection. > >Seems like you could out grow that one. Also seems like something >that's been done already in many forms. You request the .inc file and >it gets transformed by the .tmpl file. XML + XSLT? SSI? I think you can do >better. XML/XSLT is a mess, as is SSI. In fact, replacing SSI was exactly the goal with this system. I implemented a small site with SSI and then did the same site with Apache::QuickCMS. The result was a sharp reduction in space. SSI site was 112k and 409 lines, while the Apache::QuickCMS version was 32k and 190 lines. (I can provide the tarball of the sites as implemented off-list if anyone wants to take a look). I actually started with the content files being in an XML format. However, I ran into problems in coercing the XML parser that the embedded HTML should be treated as a string (so it can be put into the template parameter), not as more XML to be processed. While looking at how to solve this, I thought up the POD-like solution. I recoded it without any of the problems XML gave me, and it's probably faster, too. I wasn't happy with having to allow <TMPL_INCLUDE> tags inside the content, as I fear it could be easily abused in naive ways. It also slows it down quite a bit, since it requires two passes through HTML::Template (at least, that's how it's currently implemented). However, for some of our data, I found we simply didn't have another choice. >You probably already did this, but you might want to review other CMS if >redesigning the site from scratch. Here's a few lists: > > http://www.cmsmatrix.org/ > http://www.oscom.org/matrix/ I've looked at a lot of CMS systems--it's one of those massively over-implemented genres, much like templating systems :) I suspect the reason is that people look at other CMSes, decide they do almost but not quite exactly what they want, and end up implementing their own. So I decided I would add to the mix :) Yes, it's simple; that's intentional. I hope (possibly in vain) that I can avoid the feeping creaturism that tends to plague other CMSes. >There's also PHP. PHP has other problems. > > Now, the system allows TMPL_INCLUDE tags in the content files (actually, > > it's implemented by passing it through HTML::Template a second time, so > any > > TMPL_* tag will be processed, but this might change). Included files > > occasionally need to be part of the search, but most likely won't. But I > > don't feel I can make that assumption in all cases. So I need some way of > > saying which ones should be searched on if we should ever need that > > functionality (but default to not searching). > >And you also need a system to process your template files like >Apache::QuickCMS does so you can index. Give spidering a try, you may >find it's not as inefficient as you think. libxml2 is damn fast. The processing stage isn't hard (remember, simplicity is a goal of Apache::QuickCMS), and in any case, I think I can modify Apache::QuickCMS quite easily so that the processing stage could be directly used by another program. So I wouldn't need to write a seperate processor to run before the indexer. I'm not really concerned with spidering being inefficient. Worse case I can imagine is that I set it to run before I leave one day and it's done when I come in the next morning. It just seems to me that it's a more clumsy solution to this problem. (In fact, I do have a spider which runs through our entire site and jots down what pages link to what other pages. It takes 5-10 minutes to run. The resulting report is dumped in YAML format to be processed by other programs to generate various reports, such as what pages link to documents that don't exist. Which saved us a lot of time, because the boss wanted our current site mapped by hand before we got to the redesign. Now we have a report which is useful for stuff beyond redesigns, the least of which is the printed version of the report (1500 pages, double-sided), which is handy for my boss to walk into meetings to justify why we need a redesign :) >-- >Bill Moseley >mo...@ha... |
From: Bill M. <mo...@ha...> - 2004-06-02 18:34:34
|
On Wed, Jun 02, 2004 at 12:43:00PM -0500, Timm Murray wrote: > At 09:45 AM 6/2/04 -0700, Bill Moseley wrote: > <> > >> I don't think either solution is particularly difficult to implement, > >> but scanning the content files directly also lets us have an easier > >> time analyzing the structure of the document. > > > >All the server does is supply the content. Analyzing the content > >happens after that, regardless of using the server or the file system. > >Spidering lets you index the content as people see it on their browser. > > Take a look at the system being > used: http://www.perlmonks.org/index.pl?node_id=357248, particularly the > 'Documents' subsection. Seems like you could out grow that one. Also seems like something that's been done already in many forms. You request the .inc file and it gets transformed by the .tmpl file. XML + XSLT? SSI? I think you can do better. You probably already did this, but you might want to review other CMS if redesigning the site from scratch. Here's a few lists: http://www.cmsmatrix.org/ http://www.oscom.org/matrix/ There's also PHP. > Now, the system allows TMPL_INCLUDE tags in the content files (actually, > it's implemented by passing it through HTML::Template a second time, so any > TMPL_* tag will be processed, but this might change). Included files > occasionally need to be part of the search, but most likely won't. But I > don't feel I can make that assumption in all cases. So I need some way of > saying which ones should be searched on if we should ever need that > functionality (but default to not searching). And you also need a system to process your template files like Apache::QuickCMS does so you can index. Give spidering a try, you may find it's not as inefficient as you think. libxml2 is damn fast. -- Bill Moseley mo...@ha... |
From: Luke V. <lu...@ms...> - 2004-06-02 18:20:42
|
I wasn't planning on doing that when I started :/ It used to just be that the script would grab all of the tags in a template file and fill them in with the appropriate data. I decided it would be easiest to just write a subroutine for key tags that would automagically fill in the information. I did this to avoid having a different function for each page, it's a lot more modular this way. Then I wanted to be able to hold some information inside of a template so that the header could be filled with the title, so I needed to have a way to pass some text back to that include, and this did the trick. That way the includes for the footer and header are organized in the same way they are for the new website. Which means that I can practically copy the header and footer straight from the web include dir to the template global directory. It's all about centralization. Maybe something else would have worked better, but I rarely pass arguments to the script from the templates, and when I do, it's specifically for simple appearance adjustments. It actually works very well and allows for clear and concise code. It does abuse your module a little though, but I like it. On Wed, Jun 02, 2004 at 01:12:43PM -0400, Sam Tregar wrote: > On Wed, 2 Jun 2004, Luke Valenty wrote: > > > I use attributes in my template tags without using filters and without > > messing around with HTML::Template's internals. Here is the format I > > use: > > > > <TMPL_VAR NAME="name.arg1=val1.arg2=val2"> > > My eyes! They burn! AArrrrgh. > > Serriously, that's a terribly idea. Why are you using HTML::Template > if you want to drive your program from the template? There are many > systems better suited to this use: Mason, Template Toolkit, Embperl, > etc. HTML::Template was designed to support a one-way data-flow from > Perl code to HTML template. > > -sam > |
From: Sam T. <sa...@tr...> - 2004-06-02 17:51:35
|
On Wed, 2 Jun 2004, Timm Murray wrote: > >Why would you need to hook into the parser to implement this? > > For the attribute syntax I originally suggested, it needs to be notified of > the extra attributes (which I think would be useful for more than just this > instance). Take a look at HTML::Template::Expr. I implemented that module, which uses new attributes to existing HTML::Template tags, without hooking into the parser. > More generally, filters should hook into the parser directly instead > of merely being fed the content to be processed. I've got plans for something like this for HTML::Template v3, but they're pretty nebulous. -sam |