You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(79) |
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(105) |
Feb
(42) |
Mar
(37) |
Apr
(110) |
May
(275) |
Jun
(121) |
Jul
(127) |
Aug
(8) |
Sep
(49) |
Oct
(18) |
Nov
(3) |
Dec
(25) |
2003 |
Jan
(20) |
Feb
(37) |
Mar
(29) |
Apr
(25) |
May
(32) |
Jun
(29) |
Jul
(65) |
Aug
(15) |
Sep
|
Oct
(1) |
Nov
(27) |
Dec
(2) |
2004 |
Jan
(16) |
Feb
|
Mar
(3) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Olly B. <ol...@su...> - 2004-04-26 19:23:13
|
We're moving the Xapian lists (xapian-discuss, xapian-devel, and xapian-commits) from sourceforge to lists.xapian.org. The major immediate benefit is that this should eliminate the long delays in relaying messages that we currently sometimes experience. The new list addresses are: xap...@li... xap...@li... xap...@li... And the mailman list pages are at: http://lists.xapian.org/mailman/listinfo You probably only need to update your address book, and perhaps mail filtering rules. James copied the subscriber lists over during the weekend, so you are automatically subscribed to the new lists unless you subscribed very recently. I'll send out a message to the new lists shortly - if you don't get that, you should investigate. Everyone will shortly be removed from the old lists, so you should get a bounce if you accidentally use an old list address. It doesn't appear to be possible to do anything better - sourceforge don't allow much control over list addresses. Cheers, Olly |
From: Olly B. <ol...@su...> - 2004-04-22 23:59:18
|
On Thu, Apr 22, 2004 at 08:40:47PM +0000, . . wrote: > Is anyone building any other ranking solutions for this project. I wrote a proof-of-concept implementation of cosine ranking. Otherwise I'm not aware of anything. > PageRank, <a href>text link>, and which look at links coming in and out of > each site to rank the page. > > I believe if this engine had pagerank, it would be first class. I believe pagerank is patented in the USA, which somewhat limits the usefulness of implementing it. There *is* a world outside the US, but the trend sadly seems to be that software patents are spreading. However, since search engine optimisers have figured out how to game pagerank itself, perhaps that's irrelevant. There are other undoubtably schemes for producing a weighting by analysing links which aren't patented and would actually perform better than pagerank. If you implemented a scheme which produced a weight for each document, you could easily add that into the weighting Xapian already uses. You need maximum and minimum weights that the scheme can produce and the matcher can then just figure it into the calculations. > When I do a search on some of the users, I get bed results, say for "web". > But if we mixed something liek pagerank with it, it would be GREAT! Perhaps - from what I gather the real strength of link analysis schemes is on large document collections. If you don't have enough documents, you may not see much benefit from them. Cheers, Olly |
From: . . <b1...@ho...> - 2004-04-22 20:53:11
|
Hi All Is anyone building any other ranking solutions for this project. E.I PageRank, <a href>text link>, and which look at links coming in and out of each site to rank the page. I believe if this engine had pagerank, it would be first class. When I do a search on some of the users, I get bed results, say for "web". But if we mixed something liek pagerank with it, it would be GREAT! paul Hotmail.com FREE EMAIL _________________________________________________________________ http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines |
From: Arjen v. d. M. <ar...@gl...> - 2004-03-31 12:59:46
|
On 31-3-2004 14:14, Olly Betts wrote: > On Wed, Mar 31, 2004 at 11:39:00AM +0200, Arjen van der Meijden wrote: > > No, but there are id attributes and the p elements which should perform the > same function in any browser which understand HTML4. Is this not > working in your browser? If not, what browser is that? Ow, I didn't test it. Just didn't know that was supposed to work. Best regards, Arjen |
From: Olly B. <ol...@su...> - 2004-03-31 12:14:57
|
On Wed, Mar 31, 2004 at 11:39:00AM +0200, Arjen van der Meijden wrote: > I didn't know you had picked one already. I'm not sure we had as such, but this one definitely won the informal vote. > Btw, I noticed you use <a href='#...'>-tags while there are no <a > name='....'>-tags around in the users.php-file. No, but there are id attributes and the p elements which should perform the same function in any browser which understand HTML4. Is this not working in your browser? If not, what browser is that? Cheers, Olly |
From: Arjen v. d. M. <ar...@gl...> - 2004-03-31 09:39:06
|
I didn't know you had picked one already. I'll inform the author and thank all others for their work, I forgot to do that anyway. :) Btw, I noticed you use <a href='#...'>-tags while there are no <a name='....'>-tags around in the users.php-file. Best regards, Arjen van der Meijden On 31-3-2004 4:01, Olly Betts wrote: > CVS Root: /usr/data/cvs > Module: www.xapian.org > Changes by: olly > Date: Wed Mar 31 2004 03:01:44 BST > > Log message: > Use the new logo! > > Modified files: > . : index.php > Added files: > . : xapian-logo.png > > Links: > http://xapian.org/C?www.xapian.org/index.php?1.11?1.12 > http://xapian.org/C?www.xapian.org/xapian-logo.png?1.1 |
From: <rm...@mh...> - 2004-01-23 14:58:20
|
On Fri, Jan 23, 2004 at 03:47:10PM +0100, Matthias Koeppe wrote: > > I don't have the C or C++ spec here, but I doubt that this is a valid > declaration. At least, both GCC 2.95.2 and the Sun Forte 6u2 C and > C++ compilers report an error here. (Of course, I tested it with an > ordinary function, not a method.) G++ 2.95.2, however, accepts this, > but I don't think this means anything. Yes, i'll go back to the specs ASAP. It turns out that the problems i encountered in Xapian where caused by the interface file, not the headers (i'll submit this as a bugreport to Xapian). > I admit to have written interface declarations like this in SWIG > before, though: > > void foo(int *OUTPUT, int *OUTPUT); > > This is why I will commit the following change which fixes the emitted > shadow method definition. It will show up in the next release of > SWIG. Oh, thank's a lot. I'll try it out ASAP. RalfD > --- guile.cxx.~1.11.~ Fri Nov 21 10:18:10 2003 > +++ guile.cxx Fri Jan 23 15:40:10 2004 > @@ -706,6 +706,7 @@ > String *returns = NewString(""); > String *method_signature = NewString(""); > String *primitive_args = NewString(""); > + Hash *scheme_arg_names = NewHash(); > int num_results = 1; > String *tmp = NewString(""); > String *tm; > @@ -826,10 +827,10 @@ > SwigType *pn = Getattr(p,"name"); > String *argname; > scheme_argnum++; > - if (pn) > + if (pn && !Getattr(scheme_arg_names, pn)) > argname = pn; > else { > - /* Anonymous arg -- choose a name that cannot clash */ > + /* Anonymous arg or re-used argument name -- choose a name that cannot clash */ > argname = NewStringf("%%arg%d", scheme_argnum); > } > if (strcmp("void", Char(pt)) != 0) { > @@ -841,6 +842,7 @@ > Printv(method_signature, " ", argname, NIL); > } > Printv(primitive_args, " ", argname, NIL); > + Setattr(scheme_arg_names, argname, p); > } > if (!pn) { > Delete(argname); > @@ -1184,6 +1186,7 @@ > Delete(doc_body); > Delete(returns); > Delete(tmp); > + Delete(scheme_arg_names); > DelWrapper(f); > return SWIG_OK; > } > > -- > Matthias Koeppe -- http://www.math.uni-magdeburg.de/~mkoeppe > _______________________________________________ > Swig maillist - Sw...@cs... > http://mailman.cs.uchicago.edu/mailman/listinfo/swig |
From: Matthias K. <mk...@ma...> - 2004-01-23 14:47:20
|
rm...@mh... (Le grande pinguin) writes: > i just stumbled over another little quirk in the SWIG/Guile > module. If a function is declared with named parameters SWIG won't > generate parameter names for the emitted scheme code but reuse the > names from the declaration. While this often provides easy to read > code it might break in the cases where the declaration uses the same > name (shudder) for more than one parameter (which, to my > understanding of the ISO spec, is legal in function declarations -- > sigh). > > Example: > class Hash { > int insert( char *val, char *val); > }; I don't have the C or C++ spec here, but I doubt that this is a valid declaration. At least, both GCC 2.95.2 and the Sun Forte 6u2 C and C++ compilers report an error here. (Of course, I tested it with an ordinary function, not a method.) G++ 2.95.2, however, accepts this, but I don't think this means anything. I admit to have written interface declarations like this in SWIG before, though: void foo(int *OUTPUT, int *OUTPUT); This is why I will commit the following change which fixes the emitted shadow method definition. It will show up in the next release of SWIG. --- guile.cxx.~1.11.~ Fri Nov 21 10:18:10 2003 +++ guile.cxx Fri Jan 23 15:40:10 2004 @@ -706,6 +706,7 @@ String *returns = NewString(""); String *method_signature = NewString(""); String *primitive_args = NewString(""); + Hash *scheme_arg_names = NewHash(); int num_results = 1; String *tmp = NewString(""); String *tm; @@ -826,10 +827,10 @@ SwigType *pn = Getattr(p,"name"); String *argname; scheme_argnum++; - if (pn) + if (pn && !Getattr(scheme_arg_names, pn)) argname = pn; else { - /* Anonymous arg -- choose a name that cannot clash */ + /* Anonymous arg or re-used argument name -- choose a name that cannot clash */ argname = NewStringf("%%arg%d", scheme_argnum); } if (strcmp("void", Char(pt)) != 0) { @@ -841,6 +842,7 @@ Printv(method_signature, " ", argname, NIL); } Printv(primitive_args, " ", argname, NIL); + Setattr(scheme_arg_names, argname, p); } if (!pn) { Delete(argname); @@ -1184,6 +1186,7 @@ Delete(doc_body); Delete(returns); Delete(tmp); + Delete(scheme_arg_names); DelWrapper(f); return SWIG_OK; } -- Matthias Koeppe -- http://www.math.uni-magdeburg.de/~mkoeppe |
From: <rm...@mh...> - 2004-01-23 13:42:05
|
Hello list, i just stumbled over another little quirk in the SWIG/Guile module. If a function is declared with named parameters SWIG won't generate parameter names for the emitted scheme code but reuse the names from the declaration. While this often provides easy to read code it might break in the cases where the declaration uses the same name (shudder) for more than one parameter (which, to my understanding of the ISO spec, is legal in function declarations -- sigh). Example: /* --- file: foo.h -----*/ class Hash { int insert( char *val, char *val); }; will emit: (define-method ((self <Hash>) val val) ...... ) which is not valid scheme code :-/ While i agree that such declaration are, erm, suboptimal, they are nevertheless valid and SWIG should be able to deal with them (maybe by just using the code that's allready there for unnamed parameters?). TIA Ralf Mattes |
From: James A. <jam...@ta...> - 2004-01-13 20:56:44
|
When I said earlier that they didn't work, what I really meant to say was that I didn't understand TCL well enough to realise that they did, actually, work. Oops :) No idea how fully useful they are, but I can run a quick query against my test database and get back the results I expect, so I imagine they're okay (given how small the TCL-specific code is, that it looks right, and that I didn't write it in the first place anyway - and that it's probably not even used these days :-). J -- /--------------------------------------------------------------------------\ James Aylett xapian.org ja...@ta... uncertaintydivision.org |
From: James A. <jam...@ta...> - 2004-01-13 10:12:33
|
On Tue, Jan 13, 2004 at 12:56:34AM +0000, Olly Betts wrote: > The SWIG manual says "SWIG currently requires Tcl 8.0 or a later > release", but just above that it warns "Caution: This chapter is under > repair!" so perhaps that's no longer accurate. I'll add a test for > 8.0 or newer for now as it does say that 7.x is no longer supported - we > can bump the version up later if required. I have a feeling that there's something in 8.2 that sounded useful, but I have no idea from memory whether I'm using it right now. It's worth pointing out that the bindings for TCL don't work right now - generating the mset causes an abort of some sort, can't remember what. I don't have time to debug this right now, unfortunately :-( J -- /--------------------------------------------------------------------------\ James Aylett xapian.org ja...@ta... uncertaintydivision.org |
From: James A. <jam...@ta...> - 2004-01-13 10:10:25
|
On Tue, Jan 13, 2004 at 01:56:43AM +0000, Olly Betts wrote: > OK, I've added a configure test. I'm unsure about this change I made to > tcl8/Makefile.am though: > > -tcllibdir = ~/local/lib/tcl8.2 > +tcllibdir = @TCL_LIB@ > > Is ~/local/lib/tcl8.2 a "user bindings" directory? Yes, because I wanted somewhere I could run make install into without having to install a local TCL (I have local Python and PHP installs, but I couldn't face building up another language right then :-). > If so my change assumes you're root and installing the bindings > system-wide. Which is in some ways a generic bindings issue (cf > James' recent reply to Alex about python bindings). The default should be to the default installation (ie first invocation on the path) system-wide library directory. I think that every language needs some sort of override, yes. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org ja...@ta... uncertaintydivision.org |
From: Olly B. <ol...@su...> - 2004-01-13 01:56:46
|
OK, I've added a configure test. I'm unsure about this change I made to tcl8/Makefile.am though: -tcllibdir = ~/local/lib/tcl8.2 +tcllibdir = @TCL_LIB@ Is ~/local/lib/tcl8.2 a "user bindings" directory? If so my change assumes you're root and installing the bindings system-wide. Which is in some ways a generic bindings issue (cf James' recent reply to Alex about python bindings). Cheers, Olly |
From: Olly B. <ol...@su...> - 2004-01-13 00:56:38
|
On Tue, Jan 13, 2004 at 01:36:31AM +0100, Michael Schlenker wrote: > Olly Betts wrote: > >Is there a requirement for a particular version of TCL? I'd like to > >write a configure test for them. > > If the Tcl stubs mechanism is used (which is recommended for Tcl > extensions), the bindings will load without a recompile in any Tcl > interpreter from 8.2 onwards (all the currently stable and development > releases (8.3, 8.4, 8.5)). > http://wiki.tcl.tk/stubs has the details We use SWIG to do the bindings, as it avoids having to implement them from scratch using a different extension mechanism for each scripting language (the exceptions are Perl and Java - in those cases somebody has contributed bindings written from scratch in the native extension mechanism (XS and JNI respectively)). So this boils down to "does SWIG use stubs". It appears the answer (at least for 1.3.19) is "only if you compile with -DUSE_TCL_STUBS", which isn't the default. But this doesn't really tell me what I need to know anyway - I want to know what version of TCL the bindings require to build and work, not what versions built bindings can be moved between. The SWIG manual says "SWIG currently requires Tcl 8.0 or a later release", but just above that it warns "Caution: This chapter is under repair!" so perhaps that's no longer accurate. I'll add a test for 8.0 or newer for now as it does say that 7.x is no longer supported - we can bump the version up later if required. Also, is there a tcl equivalent to perl/sed/awk's -e to put a short script on the command line? The best I can come up with for a version test is: echo 'if {$tcl_version < 8.2 } { exit 1 }' | tclsh But as I've written about 4 lines of TCL in my life, that represents about 25% of my total TCL experience... Cheers, Olly |
From: Michael S. <sc...@un...> - 2004-01-13 00:29:32
|
Olly Betts wrote: >On Mon, Jan 05, 2004 at 06:23:41PM +0000, James Aylett wrote: > > >>I don't really get TCL deeply, but I've got the bindings to build for >>it, and they load successfully in. I'll try to do a couple of simple >>demo/test scripts soon-ish, and hopefully someone else will rise to >>replicating the simple* stuff. >> >> > >Cool. Ideally we want somebody with an active interest in the bindings >for a language to maintain them, but failing that, having them is better >than not having them... > >Is there a requirement for a particular version of TCL? I'd like to >write a configure test for them. > > If the Tcl stubs mechanism is used (which is recommended for Tcl extensions), the bindings will load without a recompile in any Tcl interpreter from 8.2 onwards (all the currently stable and development releases (8.3, 8.4, 8.5)). http://wiki.tcl.tk/stubs has the details Michael |
From: Olly B. <ol...@su...> - 2004-01-12 18:01:29
|
On Mon, Jan 05, 2004 at 06:23:41PM +0000, James Aylett wrote: > I don't really get TCL deeply, but I've got the bindings to build for > it, and they load successfully in. I'll try to do a couple of simple > demo/test scripts soon-ish, and hopefully someone else will rise to > replicating the simple* stuff. Cool. Ideally we want somebody with an active interest in the bindings for a language to maintain them, but failing that, having them is better than not having them... Is there a requirement for a particular version of TCL? I'd like to write a configure test for them. Cheers, Olly |
From: Arjen v. d. M. <ar...@gl...> - 2004-01-11 10:50:07
|
Olly Betts wrote: > On Sun, Jan 11, 2004 at 01:07:14AM +0100, Arjen van der Meijden wrote: > > In a mail off the list, Arjen noted that the speedup is greater when > adding to a database which already contains a lot of data - more than 4 > times faster per 1000 documents when the database has about 50000 > documents! Our production 0.7.5 (on a dual xeon, 5 10k rpm 36G disk raid5, 4GB RAM) doesn't really show the same drop in performance as my local ide-powered development box, or it does, but the drop is much less. It starts off with 0:45 minutes per run and drops to somewhere near 3:00 for a similar batch (I don't know the preprocessing time, probably somewhere near 0:15) after having done over 830k documents (about 6.4G of text data). But the load of the machine, while indexing might have been a bit higher when at the end of the data set, compared to the start. (another xapian database on the machine was actively searched). The process of indexing 6.4G of text took a bit over 1 day and 16 hours. When a new stable Xapian is out, I'll probably reindex the whole lot again, simply to benefit from the proposed changes which will result in a yet smaller database (which is now 15G and 10G compacted) and to see if it is so much faster on that box aswell. My local box (athlon 850, 512MB ram, data set in a mysql database on the same box (actually, even the same drive, but the data set is not read from disk while sending it to scriptindex)) starts at ~2:00 and drops to ~14:00 minutes, of which about 1 minute is preprocessing time after doing, only after having done 45k documents. The cvs-head version went from ~2:00 to ~3:30 with the same data set. > Anyway, this is really good news for scaling! As shown above, my ide-powered development box shows the poorer scaling of 0.7.5 much better than our scsi-raid-powered production box, even though the production box was actively used while my development box was simply idle. > Actually, this is no longer true - the only difference between my > working sources and CVS is that I've temporarily reverted to a "every > 1000 documents" flush criterion to give a fairer comparison between > the old and new code (better to benchmark one change at a time!) I applied the patch you sent me. So that was the same for my tests. > Does your source data contain anything confidential, or is it something > I could take a copy of for testing? I've been contemplating setting up > some nightly tests - graphing the speed and memory requirements for > indexing and searching with the CVS HEAD version would help keep Xapian > lean and mean. The data does not contain confidential texts, its composed of data which anyone can extract from our website if he was willing to do so. But I'll have to discuss this with my colleagues, since the data is not 100% our own property (i.e. the copy rights on the contents are not really ours, while the copy rights on/ownership of the data itself is ours, thats a disadvantage of running a forum ;) ). If the data is not distributed in any way, it will probably be relatively easy to allow me this. If you intended to spread it around, I'm not sure whether I'll be allowed to provide the data set (for that reason). Anyway, I'm going to ask my colleagues right now (especially the legal guys). Best regards, Arjen |
From: Olly B. <ol...@su...> - 2004-01-11 00:48:02
|
On Sun, Jan 11, 2004 at 01:07:14AM +0100, Arjen van der Meijden wrote: > My test: > A document-file for scriptindex with 50.000 documents (of which 34 are > set to delete), in total 620MB of data, which gets indexed into 1.5G of > data (including positional data and some fielded data-duplication (i.e. > with both a field-prefix and normally indexed), not being compacted). > > 0.7.5 release: > real 160m23.002s > user 112m49.054s > sys 26m20.020s > > cvs-head of 10-1-2004: > > real 75m52.441s > user 55m2.411s > sys 10m45.864s In a mail off the list, Arjen noted that the speedup is greater when adding to a database which already contains a lot of data - more than 4 times faster per 1000 documents when the database has about 50000 documents! The change I've made is to perform an efficient merge of a batch of postings with any existing postings for the same term, so it's unsuprising that this makes more difference when the database already contains a lot of data. When the database is small, it's likely that many postings will be for terms which haven't occured in previous documents. Anyway, this is really good news for scaling! > The machine didn't do much else and since it has only 512MB of ram, > there probably wasn't very much of the file cached to disk (the 0.7.5 > release was the second in turn) I'm testing on a dual processor box with (apparently) 948M of RAM, which does some other stuff (web serving, etc) but isn't generally very loaded. Rerunning a test never seems to make much difference to the times so I think jobs of this size have a sufficiently large working set that you can pretty much ignore the effect of any data being cached between test runs. > >Some of this work is in CVS, but I've not checked my very latest changes > >in as they're not very tidy currently. Actually, this is no longer true - the only difference between my working sources and CVS is that I've temporarily reverted to a "every 1000 documents" flush criterion to give a fairer comparison between the old and new code (better to benchmark one change at a time!) > Well, I'll keep this file around, so I can do other benchmarks quite easily. That would be useful - I've already spotted a way to compress termlists by an extra 14% or so (and in such a way that existing databases will work without changes). And I suspect I'll find more things to tweak... Does your source data contain anything confidential, or is it something I could take a copy of for testing? I've been contemplating setting up some nightly tests - graphing the speed and memory requirements for indexing and searching with the CVS HEAD version would help keep Xapian lean and mean. Cheers, Olly |
From: Arjen v. d. M. <ar...@gl...> - 2004-01-11 00:08:05
|
Olly Betts wrote: > On Thu, Dec 25, 2003 at 06:02:11AM +0000, Olly Betts wrote: > > OK, I've finally got some indication that the new approach will actually > be faster. My test collection of 144783 documents builds in: My test: A document-file for scriptindex with 50.000 documents (of which 34 are set to delete), in total 620MB of data, which gets indexed into 1.5G of data (including positional data and some fielded data-duplication (i.e. with both a field-prefix and normally indexed), not being compacted). 0.7.5 release: real 160m23.002s user 112m49.054s sys 26m20.020s cvs-head of 10-1-2004: real 75m52.441s user 55m2.411s sys 10m45.864s The machine didn't do much else and since it has only 512MB of ram, there probably wasn't very much of the file cached to disk (the 0.7.5 release was the second in turn) > Some of this work is in CVS, but I've not checked my very latest changes > in as they're not very tidy currently. Well, I'll keep this file around, so I can do other benchmarks quite easily. Best regards, Arjen |
From: Olly B. <ol...@su...> - 2004-01-08 03:26:59
|
On Thu, Dec 25, 2003 at 06:02:11AM +0000, Olly Betts wrote: > This is a milestone point in my changes to remove the current bottleneck > in quartz which limits update speed. Changes to posting lists are now > batched up and merged into the posting list in one go. The code passes > all tests in the testsuite, but is probably still buggy. It's possibly > it doesn't even perform faster (if so, hopefully due to a totally > untuned implementation rather than a flawed approach!) OK, I've finally got some indication that the new approach will actually be faster. My test collection of 144783 documents builds in: version 0.7.4 WIP ====================== real 74m17s 51m29s user 49m07s 28m15s system 14m59s 6m04s Note the real (wall clock) time for the WIP (Work-In-Progress) run seems oddly larger than you'd expect from the user and system times and the corresponding values for 0.7.4. The box I'm testing on does other stuff, so it may have been busier for the WIP run. But I think we can roughly conclude that it's 30-47% faster so far, which I'm pretty pleased with. I haven't implemented chunking of the posting lists in the WIP - doing that should speed things up a little more! Some of this work is in CVS, but I've not checked my very latest changes in as they're not very tidy currently. Cheers, Olly |
From: James A. <jam...@ta...> - 2004-01-05 18:23:44
|
I don't really get TCL deeply, but I've got the bindings to build for it, and they load successfully in. I'll try to do a couple of simple demo/test scripts soon-ish, and hopefully someone else will rise to replicating the simple* stuff. No idea if we can be as clever with iterators in TCL as we have been in Python. If someone can point me to something explaining TCL iterator/list idioms, I'm game for having a go. Alternatively, if anyone likes the idea of Guile bindings, I'll look at that in preference, because I grok Scheme slightly better. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org ja...@ta... uncertaintydivision.org |
From: James A. <jam...@ta...> - 2004-01-01 22:54:37
|
This means that you can do something like: ---------------------------------------------------------------------- matches = enquire.get_mset(...) for match in matches: doc = match[xapian.MSET_DOCUMENT] ---------------------------------------------------------------------- The 'match' returned from the MSet used as an iterator is a list containing MSET_DID, MSET_WT, MSET_RANK, MSET_PERCENT, MSET_DOCUMENT - in order, the document id, the weight, the rank, the percentage relevance, and the actual xapian.Document(), which is ultimately what you probably want access to anyway. Examples have been updated. It'll be fairly easy to do the same thing for ESet, and indeed any other containers in Xapian. (This was, in fact, very easy to do; the hardest bit was setting up the new extra.i stuff ...) J ----- Forwarded message from James Aylett <ja...@ix...> ----- Log message: Python bindings: MSet provides a Python iterator (new target language-specific interface file extra.i included at end of xapian.i to accomodate this). ----- End forwarded message ----- -- /--------------------------------------------------------------------------\ James Aylett xapian.org ja...@ta... uncertaintydivision.org |
From: James A. <jam...@ta...> - 2003-12-28 21:54:54
|
----- Forwarded message from James Aylett <jam...@ta...> ----- Log message: Python bindings: * enable directors for MatchDecider, to allow subclassing in Python. Add documentation and an example. * Add an example equivalent to simpleexpand in xapian-examples ----- End forwarded message ----- This is a fairly important step forward in that it allows Python applications to do fairly subtle tuning of the match system. Weight functors will follow (and ExpandDeciders, eventually), but probably not until I've reworked exceptions across the SWIG language bindings (in practice this means Python and PHP) to make them more useful. If anyone has any suggestions of the order of things to try to tackle in the Python bindings, now would be a really good time to mention them, as I have a few days before I'll be sucked under again at work ... :) J -- /--------------------------------------------------------------------------\ James Aylett xapian.org ja...@ta... uncertaintydivision.org |
From: Olly B. <ol...@su...> - 2003-12-25 06:02:13
|
On Thu, Dec 25, 2003 at 05:52:11AM +0000, Olly Betts wrote: > backends/quartz/quartz_postlist.cc: Changed to merge a batch of > changes into a posting list in one pass. This is a milestone point in my changes to remove the current bottleneck in quartz which limits update speed. Changes to posting lists are now batched up and merged into the posting list in one go. The code passes all tests in the testsuite, but is probably still buggy. It's possibly it doesn't even perform faster (if so, hopefully due to a totally untuned implementation rather than a flawed approach!) I'll be away for a week or so, but if you're interested I'd love to hear tales of experiments with the CVS HEAD when I return. Anyway, must go to bed before Santa catches me up! Cheers, Olly |
From: James A. <jam...@ta...> - 2003-11-27 14:02:18
|
On Thu, Nov 27, 2003 at 12:00:03PM +0000, Olly Betts wrote: > > > But we should also provide "omegalint" functionality (not quite sure how > > > - there are a few ways it could be done) which will report potential > > > errors in an omegascript file. > > > > Why not put it in the log file? I mean, people are checking that, > > aren't they? :-) > > The number of log files and the format of entries is entirely at the > control of the omegascript templates. And we don't really want to > hide important error message which should be fixed ASAP amongst a > mountain of other log entries. Also, there may not be a log at all. Ah, fair enough. I've lost track of how logging works in omega ... > Putting errors in the logs means they're only reported in code > which is evaluated ("omegalint" can look at all paths through the > template as it needn't worry about efficiency), so problems may not > be reported at all until the system is live, and then only is somebody > is watching the logs or a user reports problems (and how many times have > you seen a broken website and decided to just try later, or try a > different site?) That's quite a complex additional tool, that's all. Perhaps we could have a maintainer mode CGI param or something that could output warnings from omega itself? (Considerably easier in hosted environments where you might not have shell access, but can install a binary static package ... .) > > Not thoroughly tested at all, no. I have a documentation patch - > > attached. > > Thanks. I wasn't actually fishing for a patch, but it'll save me > writing it! I had it written anyway. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org ja...@ta... uncertaintydivision.org |