Thread: [PyIndexer] Notes on Docs and Testing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Here are a few notes on the docs:

We probably don't need a specific index type for dates and times (because 
they can be distilled down to floats), unless there is some date-specific 
search functionality that is needed (none comes to mind).

It would be beneficial for the index to have a lower level identifer (a 64 
bit int probably) that is exposed in the indexer interface. That way a 
reverse mapping to the string identifier would not be necessary for 
applications that don't need it (such as the ZODB, which has (at least 
currently) an 8-byte object id).
Whether the string ids are supported should probably be decided at the time 
the index is instantiated. We could acheive that by having two index classes. 
A basic class that only supports integer ids and a subclass that supports 
strings.

On the testing front, I found in my travels some big old piles of text data 
for use in testing IR software. perhaps a sample database such as one of 
these can be used for scalability testing.

see: http://192.115.216.71/webir/resources.html
under "Free for all text/web files collection"

/---------------------------------------------------\
  Casey Duncan, Sr. Web Developer
  National Legal Aid and Defender Association
  c.d...@nl...
\---------------------------------------------------/

Thread: [PyIndexer] Notes on Docs and Testing

pythonindexer-discuss