From: Paolo <oo...@us...> - 2008-09-03 07:27:40
|
On Wed, Sep 03, 2008 at 05:03:35AM +0200, Ger Hobbelt wrote: ... > Given 60 classes (= CSS files), Paolo can have his KISS and I can eat > my pie too. Simple. let me stress once again that I question the _requirement_. > The set passed to classify is a set and should be passed to learn as right, if we had vector/array struct it'd be 'natural' ... > isolate (:c:) /class1 | class2 | and so on .../ ... > classify (:*:c:) [message] which is a fake vector, works on strict assumptions on how to name var/classes. Like in other situations, having true array data structure would be quite useful. > learn (:*:c:) (index) [message] > > Both look good to _me_. ;-) agreed, provided that ! learn (:*:s:) [message] <i flags> where :s: is just a subset (1 as limit, so that 'i' can be dropped) of the N classes in use, is (remains) legal (where allowed). > Because you always pass along the whole set at script level, the > classifier code (both learn and classify implementation) gets to pick there's no need for that, where's the binding between script level and classifiers implementation? eg I can define N classes, but use any subset for both LEARN / CLASSIFY at any point to my taste/needs, with the limit of the actual classifier's requirement: !# use classes: one two three four five six seven ! learn (one two three four five six seven) <i flags> [msg_x] ! learn (three four five) <i flags> [msg_y] ! learn (one) <flags> [msg1] ! ... ! classify (one two three four five six seven) <flags> ! classify (five six seven) <flags> ! classify (three six seven) <flags> ! classify (one three four six seven) <flags> ! classify (six) <flags> (cm) # class membership -> cm, unsupported atm ... > what they want/need, you get the chance to apply filters & processes > in learn that are simply impossible right now PLUS you don't have to that's C level, SVM wants 3 because it uses 3 in both cases. > worry anymore either which classifier you're gonna use because today > all the bloody buggers require their own particular incantation when > it comes to number of css files (classes) passed to learn. there are categories of classifiers that have same requirements wrt #classes and params. Now suppose the actual classes are compatible, but one classifier needs 1+ extras (eg SVM) and I want to compare classifiers, then it'd be nice to do (SVM case, forget 4now actual class compatibility): ! learn (a b a_v_b) <svm flags> # wants all 3 ! classify (a b a_v_b) <svm flags> (s_svm) # wants all 3 ! classify (a b) <xxx flags> (s_xxx) # can't use the extra a_v_b > So no unified ... mess; I'd say it's unified ... structure / design. maybe, but that's not as simple as saying : define: N classes hence: LEARN(1 2 ... N) CLASSIFY(1 2 ... N) which might turn into a mess, or better shift the mess from one place to another. > Cost for Trever @ 60 classes? nil. wasn't thinking of run time cost, but script readability. > You save far more time when you find a way to reduce disc I/O cache > misses on your memory-mapped CSS files, even when you achieve such a > feat for learn alone (which would be rather weird and besides, unless > you 'Train Everything', optimizing classify is the winner). I have a yes, though once N classes get mmaped for a CLASSIFY a single class LEARN can check for it and won't mmap() again, and mmsync() can be deferred iff other processes that use same class(es) do that via shared mem. > Want some real, achievable gain? convert crm114 to play 'server', i.e. > permanently loaded and CSS files (close to) permanently mapped in yes yes yes yes - the endless daemon saga :) > invocation of crm114 and the moment the script *tokenizer* kicks in. > You're not even *executing* script yet by then! The rest (8%) is > spread across tokenizing ('compiling the [small!] script'), tokenized > script code execution, wrap-up and unidentified fluff elsewhere. > Believe me, if I'd see an easy way to kick that bugger into higher > gear, you'd already have it. yeah, maybe the ability to run pre-compiled scripts can be good idea for a number of applications. > seriously considering hacking crm114 into becoming mod_crm114, i.e. an > Apache2 plugin: you get the server, the socket I/O and the like Apache's Lucene and derivatives. > live in there like a wicked PHP-alike server-side scripting language > and you will definitely achieve instant notoriety. ;-) and support headache ;) > Anyhow, I don't see any good reason why the learn (classes) argument > cannot be identical to the related classify (classes) argument, except see above: CAN but definitely should not be a MUST. > ONE: strict adherence to 'backwards compatibility' at CRM114 script just one good reason. -- paolo |