From: Jauslin K. <kai...@li...> - 2007-11-19 18:47:39
|
Hello Fez developers, the current trunk version does not allow boolean searching (AND/OR/NOT) = in advanced search (e.g. to get all documents from author1 OR author2. = Is there a plan for implementation?=20 What about the integration of our MySQL fulltext indexing into the = trunk? It runs very well over here, with good performance (our fulltext = table currently has 11'000 entries). I also did an implementation of = highlighting query words in fulltext extracts and a document preview for = the browse lists. Also this runs without much performance penalties. The only thing that is not so easily possible with the current = implementation is the combination of advanced, field based search with = complex boolean fulltext search. I think for this functionality, another = indexing approach (e.g. using Zend Lucene) should be used. Could you please tell me what you think about these problems and whether = there is anything going on into that direction? Cheers from Zurich, Kai --=20 Kai Jauslin, Dipl. Informatik-Ing. ETH, ETH Z=FCrich, ETH-Bibliothek, = R=E4mistrasse 101, CH-8092 Z=FCrich kai...@li..., Tel +41-44-6324972, B=FCro STB F19 |
From: Christiaan K. <c.k...@li...> - 2007-11-19 22:38:48
|
Hello Kai I am currently in Paris having a week off for holidays after the SUN PASIG meeting. I'll be back in Australia next week. Have you seen my presentation slides? They talk about the fez 2 release and what will be in fez 2.1: http://espace.library.uq.edu.au/view.php?pid=UQ:119976 We are certainly going to bring your fulltext code into the Fez trunk, as soon as we can - possibly in the next couple of weeks. I am looking very seriously into Postgresql. It provides a much more powerful fulltext search engine called 'TSearch2'. Also there is a php parser for google-style and/not/or searching with brackets ()s with the very nice tsearch2 digital stratum fulltext query parser (php) - http://digitalstratum.com/oss/fts_parser However we will continue to support mysql as an equal option for the fez index RDBMS. We could probably adapt the digital stratum parser to create sql code for mysql as well as postgresql. Cheers, Christiaan On 19/11/07 7:47 PM, "Jauslin Kai" <kai...@li...> wrote: > Hello Fez developers, > > the current trunk version does not allow boolean searching (AND/OR/NOT) in > advanced search (e.g. to get all documents from author1 OR author2. Is there a > plan for implementation? > > What about the integration of our MySQL fulltext indexing into the trunk? It > runs very well over here, with good performance (our fulltext table currently > has 11'000 entries). I also did an implementation of highlighting query words > in fulltext extracts and a document preview for the browse lists. Also this > runs without much performance penalties. > > The only thing that is not so easily possible with the current implementation > is the combination of advanced, field based search with complex boolean > fulltext search. I think for this functionality, another indexing approach > (e.g. using Zend Lucene) should be used. > > Could you please tell me what you think about these problems and whether there > is anything going on into that direction? > > Cheers from Zurich, Kai -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Christiaan Kortekaas Senior Library Open Sorcerer Library Technology Service The University of Queensland, Australia QLD 4072 Telephone : (+61) (7) 3346 4337 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
From: Kai J. <kai...@li...> - 2007-11-21 18:44:10
|
Hi Christiaan, thanks for your answer. Nice slides... where can I get FezTube? :-) As for the search engine: I had a look at the digital stratum filter, but was a little bit disappointed. We would have to adjust it to support UTF-8 for internationalization, and wildcards (*?). A rebuild using phplexer/parser generator might be faster and more promising on the long term. The main problem I see however, is that structured and unstructured (fulltext) search cannot be combined with the current Fez2 search key structure (using the MySQL indexing for fulltext). Example: return all documents with "+sky -sun". Combining the search key tables (attachment and title for example) using unions may return documents that have a title like 'The blue sky' with "the sun was high in the sky" as fulltext. Reason: boolean search 'rek_title like "sky" and not rek_title like "sun"' (will return the row) UNION fulltext search (which will ignore the document because of the occurence of "sun"). I am now going to take a closer look at Zend Lucene (TSearch2 might solve the problem, but I strongly favor database independence). So the current plan is to integrate Lucene as a parallel Fez index, using MySQL for the authorization indexing. This way, we would have an extremely solid and proven search engine in Fez. I'm going to investigate this possibility and keep you up-to-date. What do you think about it? Cheers, Kai Christiaan Kortekaas schrieb: > Hello Kai > > I am currently in Paris having a week off for holidays after the SUN PASIG > meeting. I'll be back in Australia next week. > > Have you seen my presentation slides? They talk about the fez 2 release and > what will be in fez 2.1: > > http://espace.library.uq.edu.au/view.php?pid=UQ:119976 > > We are certainly going to bring your fulltext code into the Fez trunk, as > soon as we can - possibly in the next couple of weeks. > > I am looking very seriously into Postgresql. It provides a much more > powerful fulltext search engine called 'TSearch2'. Also there is a php > parser for google-style and/not/or searching with brackets ()s with the very > nice tsearch2 digital stratum fulltext query parser (php) - > http://digitalstratum.com/oss/fts_parser > > However we will continue to support mysql as an equal option for the fez > index RDBMS. We could probably adapt the digital stratum parser to create > sql code for mysql as well as postgresql. > > Cheers, > Christiaan > > > On 19/11/07 7:47 PM, "Jauslin Kai" <kai...@li...> wrote: > > >> Hello Fez developers, >> >> the current trunk version does not allow boolean searching (AND/OR/NOT) in >> advanced search (e.g. to get all documents from author1 OR author2. Is there a >> plan for implementation? >> >> What about the integration of our MySQL fulltext indexing into the trunk? It >> runs very well over here, with good performance (our fulltext table currently >> has 11'000 entries). I also did an implementation of highlighting query words >> in fulltext extracts and a document preview for the browse lists. Also this >> runs without much performance penalties. >> >> The only thing that is not so easily possible with the current implementation >> is the combination of advanced, field based search with complex boolean >> fulltext search. I think for this functionality, another indexing approach >> (e.g. using Zend Lucene) should be used. >> >> Could you please tell me what you think about these problems and whether there >> is anything going on into that direction? >> >> Cheers from Zurich, Kai >> > > |
From: Christiaan K. <c.k...@li...> - 2007-11-21 19:42:11
|
Hi Kai If you have a look in eserv.php you=B9ll see that eserv.php handles flash video (.flv) files differently using an embedded flash video player: http://dev-repo.library.uq.edu.au/websvn/filedetails.php?repname=3Dfez&path=3D%= 2 Ftrunk%2Feserv.php We haven=B9t yet added an automatic =8Con-ingest=B9 workflow to automatically add dissemination copies of mpg/mpeg2 video (and other formats) as flash video (flv) files, but plan to do this soon. Until then we have done it manually using a cross-platform bit of software called =8Cmuencode=B9 (although I may have got the spelling wrong. The new workflow would wrap a fez webservice around mu-encode just like we do for imagemagick for image conversion. As for the problems you see, yes they are probably problems all application= s with search engines come across, and a good solution may be Zend Lucene, although it would be very nice if we could figure a way out to put the authorization into the lucene index too, otherwise we come across result-list paging, and mass post search authz filtering problems. The Moodle project is looking into doing this too, using zend lucene and puttin= g the authz rules into the index in a google-summer-of-code project called th= e =8CGlobal search module=B9 for Moodle. You can see their wiki for details and their =8Ctalk=B9 panel and forums on this topic. I am very happy you are also looking into this and will be very interested to see your progress. Thanks for the information, Christiaan=20 On 21/11/07 7:43 PM, "Kai Jauslin" <kai...@li...> wrote: > Hi Christiaan, >=20 > thanks for your answer. Nice slides... where can I get FezTube? :-) >=20 > As for the search engine: I had a look at the digital stratum filter, but= was > a little bit disappointed. We would have to adjust it to support UTF-8 fo= r > internationalization, and wildcards (*?). A rebuild using phplexer/parser > generator might be faster and more promising on the long term. >=20 > The main problem I see however, is that structured and unstructured (full= text) > search cannot be combined with the current Fez2 search key structure (usi= ng > the MySQL indexing for fulltext). Example: return all documents with "+sk= y > -sun". Combining the search key tables (attachment and title for example) > using unions may return documents that have a title like 'The blue sky' w= ith > "the sun was high in the sky" as fulltext. Reason: boolean search 'rek_ti= tle > like "sky" and not rek_title like "sun"' (will return the row) UNION full= text > search (which will ignore the document because of the occurence of "sun")= . >=20 > I am now going to take a closer look at Zend Lucene (TSearch2 might solve= the > problem, but I strongly favor database independence). So the current plan= is > to integrate Lucene as a parallel Fez index, using MySQL for the authoriz= ation > indexing. This way, we would have an extremely solid and proven search en= gine > in Fez. I'm going to investigate this possibility and keep you up-to-date= . >=20 > What do you think about it? >=20 > Cheers, Kai >=20 >=20 > Christiaan Kortekaas schrieb: >> =20 >> Hello Kai >>=20 >> I am currently in Paris having a week off for holidays after the SUN PAS= IG >> meeting. I'll be back in Australia next week. >>=20 >> Have you seen my presentation slides? They talk about the fez 2 release = and >> what will be in fez 2.1: >>=20 >> http://espace.library.uq.edu.au/view.php?pid=3DUQ:119976 >>=20 >> We are certainly going to bring your fulltext code into the Fez trunk, a= s >> soon as we can - possibly in the next couple of weeks. >>=20 >> I am looking very seriously into Postgresql. It provides a much more >> powerful fulltext search engine called 'TSearch2'. Also there is a php >> parser for google-style and/not/or searching with brackets ()s with the = very >> nice tsearch2 digital stratum fulltext query parser (php) - >> http://digitalstratum.com/oss/fts_parser >>=20 >> However we will continue to support mysql as an equal option for the fez >> index RDBMS. We could probably adapt the digital stratum parser to creat= e >> sql code for mysql as well as postgresql. >>=20 >> Cheers, >> Christiaan=20 >>=20 >>=20 >> On 19/11/07 7:47 PM, "Jauslin Kai" <kai...@li...> >> <mailto:kai...@li...> wrote: >>=20 >> =20 >> =20 >>> =20 >>> Hello Fez developers, >>>=20 >>> the current trunk version does not allow boolean searching (AND/OR/NOT)= in >>> advanced search (e.g. to get all documents from author1 OR author2. Is = there >>> a >>> plan for implementation? >>>=20 >>> What about the integration of our MySQL fulltext indexing into the trun= k? It >>> runs very well over here, with good performance (our fulltext table >>> currently >>> has 11'000 entries). I also did an implementation of highlighting query >>> words >>> in fulltext extracts and a document preview for the browse lists. Also = this >>> runs without much performance penalties. >>>=20 >>> The only thing that is not so easily possible with the current >>> implementation >>> is the combination of advanced, field based search with complex boolean >>> fulltext search. I think for this functionality, another indexing appro= ach >>> (e.g. using Zend Lucene) should be used. >>>=20 >>> Could you please tell me what you think about these problems and whethe= r >>> there >>> is anything going on into that direction? >>>=20 >>> Cheers from Zurich, Kai >>> =20 >>> =20 >> =20 >>=20 >> =20 >=20 >=20 >=20 > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >=20 > _______________________________________________ > Fez-developers mailing list > Fez...@li... > https://lists.sourceforge.net/lists/listinfo/fez-developers --=20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Christiaan Kortekaas Senior Library Open Sorcerer Library Technology Service The University of Queensland, Australia QLD 4072 Telephone : (+61) (7) 3346 4337 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |