Thanks for pointing this out. It is true in fact that
there's a serious issue there. Frankly, I didn't notice
this until your first mentioning of the apparent bug.
When reindexing a document, I too used to invoke delete_document($doc_id)
first prior to calling index_document($doc_id)
as per module documentation. Now it appears as if delete_document() doesn't
do much, fater all. The reason everything
still worked for me (back in the days when I was a mere user of the module)
was due to the fact that index_document()
would internally call update_document() for existing documents, add
add_document() for new documents only. The way it determines
if it is dealing with a new document is by checking with the docid table.
For string frontend, document name is translated into a corresponding
doc_id; however, if no such document name is found in the docid table, a new
maximum document id is used (see get_id_for_name() subroutine at line 32 of
String.pm). At line 291 of FullTextSearch.pm, a document is marked for
addition if it's id is new maximum id (greater then current maximum). The
actual 'selection' of which subroutine is called to either add or update a
doucment happens at line 336 of FullTextSearch.pm.
However, you are absolutely right as far as this issue goes. I'll try to
allocate some time towards fixing this problem.
This problem affects primarily String frontend indexes, since as you've
mentioned doc_id is not being translated into a
corresponding numerical value prior to it's use in the SQL delete statement.
So in affect, a stringular value is used
whereas a numerical value is required (doc_id column type is 'smallint(5)
unsigned' in mysql). MySQL is also in part to blame
for it would not report a type error. If this was the case and MySQL did
report an error on type mismatch (just
as Oracle would, for example), I'm sure that the original author wouldn't
have let this bug slip.
Also, I believe that a record of a document to be deleted has to be removed
from docid table as well. This has to be changed
for all frontends, however.
-------------- ORIGINAL POST ----------------------
To Whom It May Concern:
I have been attempting to use the DBIx::FullTextSearch perl module.
I have run into a small problem using the delete_document() method. I am
using the file front-end and the column back-end for my indexing. This
method does not work when I pass it a string argument like
"TriageComputerFailures" ( I used single quotes in the actual file
string ). I have looked at the code in FullTextSearch.pm and in
Column.pm and I have found the problem. The argument to this method is
passed as-is to the delete_document() method in Column.pm This method
contains the following line to process this request :
my $sth = $dbh->prepare("delete from $data_table where doc_id = ?");
This line is looking in the table xxx_xxx_index_data for the doc_id of
the document that I have put in a name for. This obviously fails. Even
when I passed it a numeric argument ( for example '1' , as this was the
first document I indexed ). It only deletes the selected rows out of the
data table, leaving all other related rows in the database intact.
Please let me know if I am doing something incorrectly here ? If not, a
recommended fix would be greatly appreciated.
Thank-you very much for your time and attention
Information Technology Services
University Health Services
University of Texas at Austin