From: Peter C. <Pet...@me...> - 2006-03-13 13:58:47
|
> From: Colin Tatham > I think there are two possible approaches: >=20 > 1. Use regular expressions to find possible matches of the=20 > search term=20 > in the identical_table Hashtable keys, or >=20 > 2. Execute an SQL query with wildcards on the table that contains the=20 > tokens. 3. Bite the bullet, throw away the homebrew XML repository and use somebody else's work. My candidate would be eXist, which has much of this functionality already built in to XQuery. - Peter |
From: Peter C. <Pet...@me...> - 2006-03-13 15:40:22
|
> From: Jon Maber > >3. Bite the bullet, throw away the homebrew XML repository and use > >somebody else's work. My candidate would be eXist, which has much of > >this functionality already built in to XQuery. > > > > - Peter > > =20 > > > Or Apache Xindice? I'm not familiar with the product but one=20 > would hope=20 > that it has as much support behind it as the other Apache XML tools.=20 eXist has considerably more functionality, as it has XQuery as a query language rather than XPath. I have no idea of the relative performance of the two, however. > Before taking this radical approach it would be important to=20 > determine=20 > that the replacement would be equally capable. (E.g. capable=20 > of handling=20 > accent insensitive searching which is vital for searching non-english=20 > text by users who aren't sure how to type accented characters=20 > into the=20 > search tool or for searching text which may have optional accents or=20 > incorrect orthography, not to say Japanese text which may have been=20 > entered in either of two phonetic systems.) I *think* http://exist.sourceforge.net/xquery.html#N10474 says eXist, at least, can do this natively. > It will be sad to see the XML repository go - I'm very proud of the=20 > concept of translating an object oriented XML search=20 > specification into a single SQL query. Yep. I was/am proud of my Prolog setof() converter, that converted logic programming clauses into a single SQL query; but it became redundant with optimised tuple stores. > I'd be interested to see performance=20 > comparisons between my XML to SQL translator and a pure XML database=20 > like Xindice. Indeed. I suspect the greater optimisation allowed by good selection of the on-disk storage structures will mean that the XML databases have decent performance; but I'm not 100% convinced, and a benchmark would be interesting. - Peter |
From: Jon M. <jo...@te...> - 2006-03-13 16:24:50
|
Peter Crowther wrote: >>Before taking this radical approach it would be important to >>determine >>that the replacement would be equally capable. (E.g. capable >>of handling >>accent insensitive searching which is vital for searching non-english >>text by users who aren't sure how to type accented characters >>into the >>search tool or for searching text which may have optional accents or >>incorrect orthography, not to say Japanese text which may have been >>entered in either of two phonetic systems.) >> >> > >I *think* http://exist.sourceforge.net/xquery.html#N10474 says eXist, at >least, can do this natively. > > I think it's saying that the built in query engine can process the right kind of query but I've found an unresolved bug report from earlier this month on the Saxon SourceForge project that shows that it doesn't work [bug1444006]. It's also not clear to me that the indexing in eXist can be configured to support such queries - the documentation on creating indices makes no mention of collations at all. So I'm suspicious that this kind of query could be very slow - also searching for text with multiple key words won't be helped at all since eXist supports these via extention syntax to XML Query which doesn't provide the option to specify the collation or match strength like the standard XML Query text matching functions. This is all based on a very quick trawl through the eXist web site so I might be wrong. I think Bodington XML repository is still a nose ahead of the competition and won't be put out to pasture just yet. ;-) Jon |
From: Brian P. C. <bm...@bm...> - 2006-03-13 16:29:30
|
> > Peter Crowther wrote: > > >>Before taking this radical approach it would be important to > >>determine > >>that the replacement would be equally capable. (E.g. capable > >>of handling > >>accent insensitive searching which is vital for searching non-english > >>text by users who aren't sure how to type accented characters > >>into the > >>search tool or for searching text which may have optional accents or > >>incorrect orthography, not to say Japanese text which may have been > >>entered in either of two phonetic systems.) > >> > >> > > > >I *think* http://exist.sourceforge.net/xquery.html#N10474 says eXist, at > >least, can do this natively. > > > > > I think it's saying that the built in query engine can process the right > kind of query but I've found an unresolved bug report from earlier this > month on the Saxon SourceForge project that shows that it doesn't work > [bug1444006]. It's also not clear to me that the indexing in eXist can > be configured to support such queries - the documentation on creating > indices makes no mention of collations at all. So I'm suspicious that > this kind of query could be very slow - also searching for text with > multiple key words won't be helped at all since eXist supports these via > extention syntax to XML Query which doesn't provide the option to > specify the collation or match strength like the standard XML Query text > matching functions. This is all based on a very quick trawl through the > eXist web site so I might be wrong. > > I think Bodington XML repository is still a nose ahead of the > competition Berkeley XML DB? Brian and won't be put out to pasture just yet. ;-) > > Jon > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Bodington-developers mailing list > Bod...@li... > https://lists.sourceforge.net/lists/listinfo/bodington-developers > |
From: Jon M. <jo...@te...> - 2006-03-13 16:53:48
|
>Berkeley XML DB? > >Brian > > I can't find any mention of collations in the documentation I could find on their web site. It supports XML Query which implies support for collations but doesn't necessarily mean the indexing is adaptable to anything but the default collation. It's also a commercial product owned now by Oracle. Jon |
From: Jon M. <jo...@te...> - 2006-03-13 16:58:33
|
Correction - it's not a commercial product - it is open source. However, it's main developers belong to a commercial company that has just been bought by Oracle. Jon Maber wrote: > >> Berkeley XML DB? >> >> Brian >> >> > I can't find any mention of collations in the documentation I could > find on their web site. It supports XML Query which implies support > for collations but doesn't necessarily mean the indexing is adaptable > to anything but the default collation. It's also a commercial product > owned now by Oracle. > > Jon > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Bodington-developers mailing list > Bod...@li... > https://lists.sourceforge.net/lists/listinfo/bodington-developers > |
From: Brian P. C. <bm...@bm...> - 2006-03-13 17:23:43
|
> Correction - it's not a commercial product - it is open source. However, > it's main developers belong to a commercial company that has just been > bought by Oracle. Yep - the same people that now own InnoDB. http://www.oracle.com/sleepycat/index.html "Oracle has expanded its embedded database offerings through the acquisition of Sleepycat Software, Inc., a privately held supplier of open source database software for developers of embedded applications. Berkeley DB is a leader in data management for embedded "edge" applications and complements other Oracle embedded products including Oracle TimesTen In-Memory Database and Oracle Database Lite Edition. Together, Oracle and Sleepycat plan to continue to develop, support, and sell the entire family of Berkeley DB products, including Sleepycat's XML and Java Editions. Oracle has no plans to change the dual license, and we will continue to serve both open source and commercial users. All contacts, phone numbers, and email addresses for Sleepycat sales and customer support, remain the same. This site will help you learn more about this acquisition and what it will mean to our customers. If you don't find your questions here, please send them to {HYPERLINK "mailto:con...@or..."}con...@or...." (My bold) How trusting is everyone? Brian Brian > > Jon Maber wrote: > > > > >> Berkeley XML DB? > >> > >> Brian > >> > >> > > I can't find any mention of collations in the documentation I could > > find on their web site. It supports XML Query which implies support > > for collations but doesn't necessarily mean the indexing is adaptable > > to anything but the default collation. It's also a commercial product > > owned now by Oracle. > > > > Jon > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting > > language > > that extends applications into web and mobile media. Attend the live > > webcast > > and join the prime developer group breaking into this new coding > > territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Bodington-developers mailing list > > Bod...@li... > > https://lists.sourceforge.net/lists/listinfo/bodington-developers > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Bodington-developers mailing list > Bod...@li... > https://lists.sourceforge.net/lists/listinfo/bodington-developers > |
From: Colin T. <col...@ou...> - 2006-03-13 14:12:28
|
Peter Crowther wrote: >>From: Colin Tatham >>I think there are two possible approaches: >> >>1. Use regular expressions to find possible matches of the >>search term >>in the identical_table Hashtable keys, or >> >>2. Execute an SQL query with wildcards on the table that contains the >>tokens. > > > 3. Bite the bullet, throw away the homebrew XML repository and use > somebody else's work. My candidate would be eXist, which has much of > this functionality already built in to XQuery. :-) Yes, we have discussed that option, which I really like. It would make sense to store XML metadata in an XML database! Don't think we're about to do that yet tho.. -- ____________________________________ Colin Tatham VLE Team Oxford University Computing Services http://www.oucs.ox.ac.uk/ltg/vle/ http://bodington.org |
From: Jon M. <jo...@te...> - 2006-03-13 14:33:49
|
Peter Crowther wrote: >>From: Colin Tatham >>I think there are two possible approaches: >> >>1. Use regular expressions to find possible matches of the >>search term >>in the identical_table Hashtable keys, or >> >>2. Execute an SQL query with wildcards on the table that contains the >>tokens. >> >> > >3. Bite the bullet, throw away the homebrew XML repository and use >somebody else's work. My candidate would be eXist, which has much of >this functionality already built in to XQuery. > > - Peter > > Or Apache Xindice? I'm not familiar with the product but one would hope that it has as much support behind it as the other Apache XML tools. Before taking this radical approach it would be important to determine that the replacement would be equally capable. (E.g. capable of handling accent insensitive searching which is vital for searching non-english text by users who aren't sure how to type accented characters into the search tool or for searching text which may have optional accents or incorrect orthography, not to say Japanese text which may have been entered in either of two phonetic systems.) Sticking with the Bodington XML query tool I'd say there is another option to your two: add a 'loose match' word matching field to tertiary, secondary, primary and exact. However, I don't think that would be the prefered option - I'd go for your option 2. So a search for 'open' AND 'source' would trigger an initial SQL query whose results would be used to create an altered query like ('open' OR 'opening') AND ('source' OR 'sources') before being executed. It will be sad to see the XML repository go - I'm very proud of the concept of translating an object oriented XML search specification into a single SQL query. It works very well with databases that can highly optimise the resultant query. Of course it's becoming redundant as a number of pure XML database products become available that are well optimised to XPath type searches. I'd be interested to see performance comparisons between my XML to SQL translator and a pure XML database like Xindice. Jon |
From: Jon M. <jo...@te...> - 2006-03-13 15:40:58
|
In answer to my own question, Xindice doesn't seem to support full-text indices and would not be suitable as a replacement to the Bodington XML Repository. Information at xml.apache.org and exist-db.org Jon |
From: Matthew B. <mat...@ou...> - 2006-03-21 09:19:03
|
Peter Crowther wrote: >> From: Colin Tatham >> I think there are two possible approaches: >> >> 1. Use regular expressions to find possible matches of the >> search term >> in the identical_table Hashtable keys, or >> >> 2. Execute an SQL query with wildcards on the table that contains the >> tokens. > > 3. Bite the bullet, throw away the homebrew XML repository and use > somebody else's work. My candidate would be eXist, which has much of > this functionality already built in to XQuery. For real flexible searching use something specifically for it like Lucene http://lucene.apache.org/java/docs/ I see moving to eXist for XML storage as a related but different issue, at the moment it happens that the XML storage is use for searching but we may find this limiting in the future if the XML structure doesn't lead to the searches we need. -- -- Matthew Buckett, VLE Developer -- Learning Technologies Group, Oxford University Computing Services -- Tel: +44 (0)1865 283660 http://www.oucs.ox.ac.uk/ltg/ |
From: Colin T. <col...@ou...> - 2006-03-21 09:38:36
|
Matthew Buckett wrote: > Peter Crowther wrote: >> 3. Bite the bullet, throw away the homebrew XML repository and use >> somebody else's work. My candidate would be eXist, which has much of >> this functionality already built in to XQuery. > > > For real flexible searching use something specifically for it like > Lucene http://lucene.apache.org/java/docs/ > > I see moving to eXist for XML storage as a related but different issue, > at the moment it happens that the XML storage is use for searching but > we may find this limiting in the future if the XML structure doesn't > lead to the searches we need. Problem with using Lucene is that it'll need to keep its own index of words, which will have to kept in synch with Bod. By using an XML database (replacing Bods) we get good searching, don't have to synch it, and it'll be better as managing the metadata as XML for import/export, etc Colin -- ____________________________________ Colin Tatham VLE Team Oxford University Computing Services http://www.oucs.ox.ac.uk/ltg/vle/ http://bodington.org |
From: Matthew B. <mat...@ou...> - 2006-03-21 09:53:30
|
Colin Tatham wrote: > Matthew Buckett wrote: >> Peter Crowther wrote: >>> 3. Bite the bullet, throw away the homebrew XML repository and use >>> somebody else's work. My candidate would be eXist, which has much of >>> this functionality already built in to XQuery. >> >> >> For real flexible searching use something specifically for it like >> Lucene http://lucene.apache.org/java/docs/ >> >> I see moving to eXist for XML storage as a related but different >> issue, at the moment it happens that the XML storage is use for >> searching but we may find this limiting in the future if the XML >> structure doesn't lead to the searches we need. > > Problem with using Lucene is that it'll need to keep its own index of > words, which will have to kept in synch with Bod. By using an XML > database (replacing Bods) we get good searching, don't have to synch it, > and it'll be better as managing the metadata as XML for import/export, etc I think syncing data for good search in inevitable. At the moment we already sync data. The resource title and description end up in both the xml tables and in the resource tables. I'm not suggesting that we use Lucene for storing XML data, just for search data which might have orginally been in XML. This is why I think switching from XMLRespository is a slightly different issue to improving the search although they do overlap at the moment. -- -- Matthew Buckett, VLE Developer -- Learning Technologies Group, Oxford University Computing Services -- Tel: +44 (0)1865 283660 http://www.oucs.ox.ac.uk/ltg/ |
From: Adam M. <ada...@co...> - 2006-03-21 16:47:40
|
In message <441...@ou...> bod...@li... writes: > Peter Crowther wrote: > >> From: Colin Tatham > >> I think there are two possible approaches: > >> > >> 1. Use regular expressions to find possible matches of the=20 > >> search term=20 > >> in the identical_table Hashtable keys, or > >> > >> 2. Execute an SQL query with wildcards on the table that contains th= e=20 > >> tokens. > >=20 > > 3. Bite the bullet, throw away the homebrew XML repository and use > > somebody else's work. My candidate would be eXist, which has much of > > this functionality already built in to XQuery. >=20 > For real flexible searching use something specifically for it like=20 > Lucene http://lucene.apache.org/java/docs/ >=20 > I see moving to eXist for XML storage as a related but different issue,= =20 > at the moment it happens that the XML storage is use for searching but=20 > we may find this limiting in the future if the XML structure doesn't=20 > lead to the searches we need. I would co-ordinate with Howard on this one. It would make sense for us b= oth to use the same code adam |