Are there no people willing to offer suggestion? (Or at least a URL to appropriate documentation?) I would appreciate knowing how to convert my program.

On 11/9/05, Jonathan Hayward <christos.jonathan.hayward@gmail.com> wrote:
I am trying to retrofit an application I wrote, so it will use SQLObject. The application, SearchLog, presently loads a number of documents, keeps track of what keywords occur and how many, and currently implements a Boolean keyword search functionality with wildcard and phrase support (although Boolean/wildcard/phrase searching is not a high priority now), returning documents sorted by the user's choice of last programatically-updated view, last modification, or a relevance score that is dependent on keywords' proportion in the document (the relevance is not a high priority now).

At present, it's a memory hog and runs slowly with a few hundred documents, so I'd like to retrofit some of the basic functionality and possibly the "optional" functionality so that it can run quickly and without per-document memory overhead.

I have the following two classes:

class database_document(sqlobject.SQLObject):
    date_last_modified = sqlobject.DateCol()
    date_last_viewed = sqlobject.DateCol()
    #excerpt = sqlobject.StringCol ()
    relative_filename = sqlobject.StringCol()
    section = sqlobject.StringCol()
    text = sqlobject.StringCol()
    tokenized = sqlobject.StringCol()
    def _set_tokenized(self, token_list):
        value = " ".join(token_list)
        self._SO_set_tokenized(value)

class database_document_keyword(sqlobject.SQLObject)
    document_filename = sqlobject.StringCol()
    document_id = sqlobject.ForeignKey('database_document')
    word = sqlobject.StringCol()
    frequency = sqlobject.IntCol()
    total = sqlobject.IntCol()
    def _get_proportion(self):
        return float(self.frequency) / float(max(1, self.total))

The database_document_keyword is intended as an optimization to circumvent the inefficiency of SQL searching for "% foo %" in the tokenized form.

Given those classes, which should be straightforward to generate and populate, what is the best way to say "Give me all database_documents containing all of the following keywords, sorted by this timestamp."?

--
++ Jonathan Hayward, jonathan.hayward@pobox.com
** To see an award-winning website with stories, essays, artwork,
** games, and a four-dimensional maze, why not visit my home page?
** All of this is waiting for you at http://JonathansCorner.com

** If you'd like a Google Mail ( gmail.com) account, please tell me!



--
++ Jonathan Hayward, jonathan.hayward@pobox.com
** To see an award-winning website with stories, essays, artwork,
** games, and a four-dimensional maze, why not visit my home page?
** All of this is waiting for you at http://JonathansCorner.com

** If you'd like a Google Mail (gmail.com) account, please tell me!