Thread: [Crm114-discuss] unifying LEARN/CLASSIFY invocation (was: Re: [Crm114-general] Mixed 64-bit system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Sep 03, 2008 at 05:03:35AM +0200, Ger Hobbelt wrote:
...
> Given 60 classes (= CSS files), Paolo can have his KISS and I can eat
> my pie too. Simple.

let me stress once again that I question the _requirement_. 

> The set passed to classify is a set and should be passed to learn as

right, if we had vector/array struct it'd be 'natural' ...

> isolate (:c:) /class1 | class2 | and so on .../
...
> classify (:*:c:) [message]

which is a fake vector, works on strict assumptions on how to name 
var/classes.
Like in other situations, having true array data structure would be quite
useful.

> learn (:*:c:) (index) [message]
> 
> Both look good to _me_. ;-)

agreed, provided that

! learn (:*:s:) [message] <i flags>

where :s: is just a subset (1 as limit, so that 'i' can be dropped) of the 
N classes in use, is (remains) legal (where allowed). 

> Because you always pass along the whole set at script level, the
> classifier code (both learn and classify implementation) gets to pick

there's no need for that, where's the binding between script level and 
classifiers implementation? eg I can define N classes, but use any subset
for both LEARN / CLASSIFY at any point to my taste/needs, with the limit
of the actual classifier's requirement:

!# use classes: one two three four five six seven
! learn (one two three four five six seven) <i flags> [msg_x]
! learn (three four five) <i flags> [msg_y]
! learn (one) <flags> [msg1]
! ...
! classify (one two three four five six seven) <flags>
! classify (five six seven) <flags>
! classify (three six seven) <flags>
! classify (one three four six seven) <flags>
! classify (six) <flags> (cm)	# class membership -> cm, unsupported atm
...

> what they want/need, you get the chance to apply filters & processes
> in learn that are simply impossible right now PLUS you don't have to

that's C level, SVM wants 3 because it uses 3 in both cases.

> worry anymore either which classifier you're gonna use because today
> all the bloody buggers require their own particular incantation when
> it comes to number of css files (classes) passed to learn.

there are categories of classifiers that have same requirements wrt
#classes and params. Now suppose the actual classes are compatible, but
one classifier needs 1+ extras (eg SVM) and I want to compare classifiers,
then it'd be nice to do (SVM case, forget 4now actual class compatibility):

! learn (a b a_v_b) <svm flags>		# wants all 3
! classify (a b a_v_b) <svm flags> (s_svm)	# wants all 3
! classify (a b) <xxx flags> (s_xxx)	# can't use the extra a_v_b

> So no unified ... mess; I'd say it's unified ... structure / design.

maybe, but that's not as simple as saying :
define:	N classes
hence:	LEARN(1 2 ... N)
	CLASSIFY(1 2 ... N)
which might turn into a mess, or better shift the mess from one place to
another.

> Cost for Trever @ 60 classes? nil.

wasn't thinking of run time cost, but script readability.

> You save far more time when you find a way to reduce disc I/O cache
> misses on your memory-mapped CSS files, even when you achieve such a
> feat for learn alone (which would be rather weird and besides, unless
> you 'Train Everything', optimizing classify is the winner). I have a

yes, though once N classes get mmaped for a CLASSIFY a single class LEARN
can check for it and won't mmap() again, and mmsync() can be deferred
iff other processes that use same class(es) do that via shared mem.

> Want some real, achievable gain? convert crm114 to play 'server', i.e.
> permanently loaded and CSS files (close to) permanently mapped in

yes yes yes yes - the endless daemon saga :)

> invocation of crm114 and the moment the script *tokenizer* kicks in.
> You're not even *executing* script yet by then! The rest (8%) is
> spread across tokenizing ('compiling the [small!] script'), tokenized
> script code execution, wrap-up and unidentified fluff elsewhere.
> Believe me, if I'd see an easy way to kick that bugger into higher
> gear, you'd already have it.

yeah, maybe the ability to run pre-compiled scripts can be good idea 
for a number of applications.

> seriously considering hacking crm114 into becoming mod_crm114, i.e. an
> Apache2 plugin: you get the server, the socket I/O and the

like Apache's Lucene and derivatives.

> live in there like a wicked PHP-alike server-side scripting language
> and you will definitely achieve instant notoriety. ;-)

and support headache ;)

> Anyhow, I don't see any good reason why the learn (classes) argument
> cannot be identical to the related classify (classes) argument, except

see above: CAN but definitely should not be a MUST.

> ONE: strict adherence to 'backwards compatibility' at CRM114 script

just one good reason.

-- 
paolo

Thread: [Crm114-discuss] unifying LEARN/CLASSIFY invocation (was: Re: [Crm114-general] Mixed 64-bit system

crm114-discuss