|
From: Christopher M. <chr...@mc...> - 2003-11-15 00:49:51
|
On Fri, 2003-11-14 at 18:11, Neal Richter wrote: > OK, so I forgot the #2 Question... > > > Questions: > > > > 1) If you run an htdump -w before and after the purge, do the db.docs > > files differ? No. > 2) If you temporarily replace your start_url with the one you want to > re-add, and rerun 'htdig -v -c xxxx' does it add it properly? Does for > me.. and it shows up again in search results. No either unfortunately. > I've got this kind of thing working in libhtdig & libhtdigphp. How > attached are you to your current implementation????? Not terribly. I can change it if needed. Here's what I'm doing: I've got a couple of stored procedures in PostgreSQL that allow me to query htdig databases and return indexes. Basically, htdig returns the following type URLs: http://newfind.mcgill.ca/indexes/ads/?AdsID=1026194 where the integer at the end of the URL is the primary key of the Ads table in our database. This allows me to do things like this: http://newfind.mcgill.ca/ads/?words=jazz+guitar which is a nifty way of doing full text indexing in PostgreSQL. The only alternative at the moment in Postgres is to use GiST indexes and t_search, which is incredibly complex and so poorly documented that even the core Postgres developers are unable to get it to work. So, I've got a PL/Perl script (with a PL/pgSQL wrapper) in Postgres that returns these integers (above) as rows. This means I can do the following type query: SELECT * FROM htsearch('"reasonable offer"', 'ads'); and it returns this: item_id | htdig_order ------------+--------------- 1014752 | 1 1026970 | 2 All of this is working very nicely. The thing I want to do now is write the stored procedures that get htdig to re-index a new item. If I can do this in PHP, that's fine. Just tell me how to build PHP/HtDig with libhtdigphp and how to use it and I'm there. At the bottom of this email are the two stored procedures and data type definitions if you are interested. The Pl/pgSQL is a required wrapper because at the moment, PL/Perl scripts can't return sets, only simpler data types. If anyone else is interested in this, I can send all the code and documentation when it it completely built. I know that there are folks on the Postgres list that are interested in this when it is done. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Communications Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017 CREATE TYPE htdig AS (item_id int, htdig_order int); CREATE OR REPLACE FUNCTION htdig(text, text) RETURNS SETOF htdig AS ' DECLARE result text[]; low integer; high integer; item htdig%rowtype; BEGIN result := htsearch($1,$2); low := 1; high := array_upper(result, 1); FOR i IN low..high LOOP item.item_id := result[i]; item.htdig_order := i; RETURN NEXT item; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql' STABLE STRICT; CREATE OR REPLACE FUNCTION htsearch(text, text) RETURNS text[] AS ' my $SearchTerms = $_[0]; my $DBName = $_[1]; my @Result; my $Line; $DBName =~ s/[^a-z]//g; #dbname is only allowed letters $SearchTerms =~ s/['']/ /g; # remove single quotes (prevent SQL injection) open HTSEARCH, "/usr/local/htdig/bin/htsearch -c /usr/local/htdig/conf/${DBName}.conf ''config=${DBName};words=${SearchTerms};matchesperpage=1000;'' |"; while(<HTSEARCH>) { $Line = $_; $Line =~ s/[^0-9-]//g; chomp($Line); push @Result, $Line; } close HTSEARCH; return qq/{/ . (join qq/,/, @Result) . qq/}/; ' LANGUAGE plperlu; |