From: Colin T. <col...@ou...> - 2006-03-13 13:52:35
|
Some general ramblings relating to the XML metadata and the search tool, in the hope of getting some discussion/consensus about a solution: (What's the problem? A search for 'heading' will only return resources that contain 'heading', and not any containing 'headings' or 'head'.) - All text entered in Bod as a title or description is tokenised, each token saved to the database (xml_tokens) and in WordRepository. - WordRepository creates CollationKey/s for each token, and stores these in four Hashtables ( for primary, secondary tertiary and identical matching purposes.) - When searching, a CollationKey is created from the search term, and this is compared to the existing CollationKeys. If there's a match, an SQL query is formulated which retrieves the IDs of all the resources which contain the term, and those resources are presented in the search results. Because the actual matching is carried out using CollationKeys, it's not possible to modify some SQL to broaden the matching. All the tokens in the database are also held in WordRepository (in the identical_table Hashtable, or as Word instances in a Vector). I think there are two possible approaches: 1. Use regular expressions to find possible matches of the search term in the identical_table Hashtable keys, or 2. Execute an SQL query with wildcards on the table that contains the tokens. Another decision is how this behaviour is presented to the user, do we automatically do a wildcard search when they use the default (Primary) option? Offer another option they can choose that automatically adds a wildcard (and possibly shorten the search term before adding the wildcard)? Allow users to enter wildcards, and only use them if they are present in the search term? Any opinions? Thanks, Colin -- ____________________________________ Colin Tatham VLE Team Oxford University Computing Services http://www.oucs.ox.ac.uk/ltg/vle/ http://bodington.org |