crm114-discuss Mailing List for CRM114 Discriminator - Controllable Rege (Page 5)

Brought to you by: nkadel, oopla, vanbaal, wsy

crm114-discuss — For discussion of CRM114 in theory and practice

You can subscribe to this list here.

2004	Jan	Feb	Mar (27)	Apr (25)	May (8)	Jun (2)	Jul	Aug (1)	Sep	Oct	Nov (1)	Dec
2005	Jan (1)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep (2)	Oct (1)	Nov	Dec
2006	Jan	Feb	Mar	Apr	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct (2)	Nov	Dec
2007	Jan	Feb (1)	Mar	Apr	May (8)	Jun	Jul (3)	Aug (8)	Sep	Oct	Nov (1)	Dec
2008	Jan (3)	Feb	Mar	Apr (2)	May	Jun	Jul (2)	Aug	Sep (3)	Oct	Nov (3)	Dec
2011	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 3 4 5 (Page 5 of 5)

Re: [Crm114-discuss] Re: continuation of current model in the crm114

From: Laird B. <lb...@us...> - 2004-03-29 04:41:00

On Mar 28 2004, Raul Miller wrote:

> > Plocal-spam(w) = P({document d contains w}| document d is spam)
> 
> I think you have that backwards.
> 
> We are trying to classify whether or not a document is spam, given the
> presence of a word in the document, not determine whether or not a word
> is present, given that we know that the document is spam.
> 

Nope, that's one of crm114's primary assumptions. More precisely,
crm114 says: 
------------
Assume Plocal-spam(w) to mean that
> > Plocal-spam(w) = P({document d contains w}| document d is spam)

and also assume that Plocal-spam(w) is given by the formula involving
Nspam(w) etc.

Then using these assumptions and the chain rule blah blah output
probability of being spam.
------------

I'm saying the assumptions are inconsistent to start with, in some cases.
Using inconsistent assumptions and applying the chain rule correctly
still gives inconsistent results.

> In the context of crm114, we are never in doubt as to whether or
> not a word is present in any given document.
> 

That's true and irrelevant to my point. The Plocal-spam(w) numbers
don't give certainties about specific documents, they give
probabilities valid for all documents.  When I write

> > Plocal-spam(w) = P({document d contains w}| document d is spam)

I am not stating something about a specific document d, I am using d
as a variable to relate the set on the right of | with the set on the
left. In words, Plocal-spam(w) means "if a document is spam, then
what's its probability of containing w?". 

-- 
Laird Breyer.

Re: [Crm114-discuss] Re: continuation of current model in the crm114

From: Raul M. <mo...@ma...> - 2004-03-29 03:58:33

> On Mar 28 2004, Raul Miller wrote:
> > Ok, yes, if that's the probability you're talking about, I agree your
> > assertion holds true.
> > 
> > But crm114 doesn't seem to assume that P[S1] is independent of P[S2],
> > for arbitrary S1 and S2 so I don't see the relevance of this statement.

On Mon, Mar 29, 2004 at 12:52:33PM +1000, Laird Breyer wrote:
> Well, the local probabilities assume the statement I've made about S1
> and S2, in some cases (only). 

I'll take that as an informal expression of a point you intend to prove.

> Plocal-spam(w) = P({document d contains w}| document d is spam)

I think you have that backwards.

We are trying to classify whether or not a document is spam, given the
presence of a word in the document, not determine whether or not a word
is present, given that we know that the document is spam.

In the context of crm114, we are never in doubt as to whether or
not a word is present in any given document.

-- 
Raul

Re: [Crm114-discuss] Re: continuation of current model in the crm114

From: Laird B. <lb...@us...> - 2004-03-29 02:52:41

On Mar 28 2004, Raul Miller wrote:

> Ok, yes, if that's the probability you're talking about, I agree your
> assertion holds true.
> 
> But crm114 doesn't seem to assume that P[S1] is independent of P[S2],
> for arbitrary S1 and S2 so I don't see the relevance of this statement.

Well, the local probabilities assume the statement I've made about S1
and S2, in some cases (only). 

Plocal-spam(w) = P({document d contains w}| document d is spam)

More precisely, the set has the definition (as is standard with
probabilistic arguments)

{document d contains w} =(def) 
 {g: g is a realization of the document d and g contains w}

So Plocal-spam(w) = Q(S1), where 

S1 = { document d contains w }
Q(.) = P(.| document d is spam)

Q is a probability measure, called the "conditional probability
measure, given the event {document d is spam}".

Now take w = "cat" to obtain S1, and w = "cat sees" to obtain S2.

We have S1 includes S2, since
 {g: g is a realization of the document d and g contains "cat"}
includes
 {g': g' is a realization of the document d and g' contains "cat sees"}

so Q(S1) >= Q(S2), and therefore 
Plocal-spam("cat") >= Plocal-spam("cat sees").

> If I understand you correctly, you're saying that the probability I
> called Pa[w1] can't be used in the context of spam recognition because
> it represents something different from what you're referring to as P[S1]?

That's right. For the particular case w1 = "cat" and w2 = "cat sees",
(and ony for that case and similar ones), you cannot take simultaneously

Pa[cat] = Na[cat] / (Na[cat] + Nb[cat])
Pa[cat sees] = Na[cat sees] / (Na[cat sees] + Nb[cat sees])

unless they happen to satisfy Pa[cat] >= Pa[cat sees] and any other
applicable consistency requirements.

Something has got to give, and what gives is up to you. 

For example, you can change the formula Pa[cat], or change the formula
Pa[cat sees] or drop the interpretation 
Pa[cat] = P(document d contains "cat" | document d is spam),
thereby losing the Bayesian chain rule step, or any number of other things.

-- 
Laird Breyer.

Re: [Crm114-discuss] Re: continuation of current model in the crm114

From: Raul M. <mo...@ma...> - 2004-03-29 01:48:37

On Mon, Mar 29, 2004 at 11:21:29AM +1000, Laird Breyer wrote:
> > *** when Na[w2] <= Na[w1] and Nb[w2] <= Nb[w1] then Pa[w2] must be <= Pa[w1]
> 
> My (intended) assertion is much simpler. It is:
> 
> For any fixed set u:
> S1 = {documents in set u containing w1}
> S2 = {documents in set u containing w2}
> whenever S2 is included in S1, then P[S2] <= P[S1].
> 
> To the extent that *** represents my intended assertion, it must be
> true.

I think you're saying that P[S1] is the probability that a document in
set u contains w1.

Ok, yes, if that's the probability you're talking about, I agree your
assertion holds true.

But crm114 doesn't seem to assume that P[S1] is independent of P[S2],
for arbitrary S1 and S2 so I don't see the relevance of this statement.

> Your proof shows that the two statements
> 
> > Pa[w] = Na[w] / (Na[w] + Nb[w])
> > *** when Na[w2] <= Na[w1] and Nb[w2] <= Nb[w1] then Pa[w2] must be <= Pa[w1]
> 
> are incompatible. In this case, one of the two (or both) must be
> dropped. The second one ***, to the extent that it represents an
> assertion (S1,S2) I gave at the top, always has priority. 
> So in that case the first one, namely
> 
> > Pa[w] = Na[w] / (Na[w] + Nb[w])
> 
> must be dropped. You could use the following instead, for example:
...
> That is why in my examples I (tried to) always take w2 = "f1 f2" and
> w1 = "f1".
> 
> Hopefully, we should now be in general agreement?

If I understand you correctly, you're saying that the probability I
called Pa[w1] can't be used in the context of spam recognition because
it represents something different from what you're referring to as P[S1]?

I must admit I don't see any validity to this line of reasoning.

-- 
Raul

Re: [Crm114-discuss] Re: continuation of current model in the crm114

From: Laird B. <lb...@us...> - 2004-03-29 01:21:37

Hi Raul,

I think with your example we're ready to clear this up.

On Mar 28 2004, Raul Miller wrote:

> Your assertion is apparently:
> 
> *** when Na[w2] <= Na[w1] and Nb[w2] <= Nb[w1] then Pa[w2] must be <= Pa[w1]

My (intended) assertion is much simpler. It is:

For any fixed set u:
S1 = {documents in set u containing w1}
S2 = {documents in set u containing w2}
whenever S2 is included in S1, then P[S2] <= P[S1].

To the extent that *** represents my intended assertion, it must be
true.

> I agree that whenever the feature "cat sees" occurs, then also the feature
> "cat" occurs, but I disagree with your conclusion.
> 
> Using the expression 
> Pa[w] = Na[w] / (Na[w] + Nb[w])
> 
> where 
> Na[w] is the number of occurrences of w in set a
> Nb[w] is the number of occurrences of w in set b
> 
> Pa[w] is the probability associated with w that the containing document
> is in set a rather than set b.
> 
> *** when Na[w2] <= Na[w1] and Nb[w2] <= Nb[w1] then Pa[w2] must be <= Pa[w1]
> 
> Disproof:
> 
> Let Na[w2] = 1
> Let Na[w1] = 2
> Let Nb[w2] = 0
> Let Nb[w1] = 2
> 
> Here: Pa[w2] = 1 and Pa[w1] = 0.5
> 
> I don't see any basis for claiming otherwise.
> 

Your proof shows that the two statements

> Pa[w] = Na[w] / (Na[w] + Nb[w])
> *** when Na[w2] <= Na[w1] and Nb[w2] <= Nb[w1] then Pa[w2] must be <= Pa[w1]

are incompatible. In this case, one of the two (or both) must be
dropped. The second one ***, to the extent that it represents an
assertion (S1,S2) I gave at the top, always has priority. 
So in that case the first one, namely

> Pa[w] = Na[w] / (Na[w] + Nb[w])

must be dropped. You could use the following instead, for example:

Pa[cat] = Na[cat]/(Na[cat] + Nb[cat])
Pa[sees] = Na[sees]/(Na[sees] + Nb[sees])
Pa[cat sees] = (Na[cat]/(Na[cat] + Nb[cat])) * (Na[sees]/(Na[sees] + Nb[sees]))

However, the *** statement does not always represent an (S1,S2)
assertion, and when it doesn't, then there is no requirement for ***
to hold. 

For example, w1 could be "cat" and w2 could be "dog", and the counts
Na[w2] <= Na[w1] and Nb[w2] <= Nb[w1] could be just coincidence. Then
*** needn't hold.

That is why in my examples I (tried to) always take w2 = "f1 f2" and
w1 = "f1".

Hopefully, we should now be in general agreement?

-- 
Laird Breyer.

[Crm114-discuss] Re: continuation of current model in the crm114

From: Raul M. <mo...@ma...> - 2004-03-28 20:00:31

On Sun, Mar 28, 2004 at 11:51:52AM +1000, Laird Breyer wrote:
> > You have asserted that the probability that a document is spam based
> > on the presence of a word in a document can be completely determined
> > for Naive Bayes by knowing how many spam documents the word appears
> > in and with no knowledge of how many non-spam documents the word
> > appears in.
> >
> > You are simply wrong about this.
> 
> This wasn't my intention. The formulas used contain N_spam[feature] and
> N_nonspam[feature]. What is true is that within solely the spam
> documents, the counts of "cat" and the counts of "cat sees" must be
> consistent (ie the count of "cat" is higher than the count of "cat sees"). 
> Similarly, within solely the nonspam documents, the count of "cat" is
> higher than the count of "cat sees". Similarly, within both the spam
> and nonspam documents together, the count of "cat" is higher than the
> count of "cat sees". And finally, within the set of all possible
> documents, the count of "cat" is higher than the count of "cat sees".

I agree up to here.

> I've stated this in terms of counts. It's also true in terms of
> probabilities. If you have a well defined probability on documents,
> then the probability of the feature "cat" must be higher than the
> probability of the feature "cat sees", simply because whenever the
> feature "cat sees" occurs, then also the feature "cat" occurs.

I disagree here.

I agree that whenever the feature "cat sees" occurs, then also the feature
"cat" occurs, but I disagree with your conclusion.

Using the expression 
Pa[w] = Na[w] / (Na[w] + Nb[w])

where 
Na[w] is the number of occurrences of w in set a
Nb[w] is the number of occurrences of w in set b

Pa[w] is the probability associated with w that the containing document
is in set a rather than set b.

Your assertion is apparently:

*** when Na[w2] <= Na[w1] and Nb[w2] <= Nb[w1] then Pa[w2] must be <= Pa[w1]

Disproof:

Let Na[w2] = 1
Let Na[w1] = 2
Let Nb[w2] = 0
Let Nb[w1] = 2

Here: Pa[w2] = 1 and Pa[w1] = 0.5

I don't see any basis for claiming otherwise.

-- 
Raul

[Crm114-discuss] Re: continuation: current model in crm114

From: Laird B. <lb...@us...> - 2004-03-28 13:26:08

Hi Christian,

Here's the proof (a bit long, so I didn't want to include it earlier).

Let's first assume that D has finite cardinality |D|. This is the case
if the vocabulary is finite and documents have a maximum size. We can
take the limit afterwards.

Claim: Assume |D| is finite. Define for any t \in D

> > Prob( t is spam | t = (t_1,...,t_n) ) =(def)
> >                    P( T is spam | T = \phi(t_1,...,t_n) )

Then there exists no probability measure "Prob" defined on D such 
that the above are conditional probabilities for all documents t \in D.

Proof:

Let M be the 2x|D| matrix, where the row label represents
{spam,notspam} and the column label represents an enumeration of the
documents t \in D. 

M_st =(def) Prob(d is "s" | d = t) =(def) P(T is "s" | T = \phi(t)),

Let N be the |D|x2 matrix, where the row label is a document and the
column label is in {spam,notspam}:

N_tr =(def) Prob(d = t | d is "r") = \prod_k P(T_k = \phi(t)_k | T is "r")

The last product is because on D_5, the terms T_k are independent,
conditionally on the spam/notspamc class. I'm only defining these
matrices so I don't have to write long formulas.

If the measure Prob on D exists and is a probability, then we must
certainly have 

Prob(d is "s") = \sum_t Prob(d is "s"| d = t) Prob(d = t)
    = \sum_{t,r} Prob(d is "s"| d = t)Prob(d = t|d is "r")Prob(d is "r")
v   = MNv

where v is the column vector v = (Prob(d is "spam"), Prob(d is "notspam"))^T
and we are summing over the real documents t \in D only. 

Let A = MN, then v is an eigenvector with eigenvalue 1, and v has the
special form v = (a, 1 - a)^T with 0 <= a <= 1. Now

v_i = \sum_j A_ij v_j, and \sum_i v_i = 1 gives

\sum_j (\sum_i A_ij) v_j = 1, which simplifies to 

(*) a (\sum_i A_{i,spam}) + (1-a) (\sum_i A_{i,notspam}) = 1.

The elements of A are of course all nonnegative, and I claim there is
no solution for 0 <= a <= 1. If no such a exists, then there is no
measure Prob on D. 

Here is why no such a exists: Let M_5 and N_5 be defined exactly like
M and N, but on D_5, with respect to the probability measure P which
we agree exists. The only difference is that on D_5, there are many
more documents, but still a finite number. 

Let A_5 = (M_5)(N_5). You can see that \sum_i (A_5)_{i,spam} = 1,
because

\sum_i (A_5)_{i,spam} = 
   \sum_{T \in D_5} [ P(spam|T)P(T|spam) + P(notspam|T)P(T|spam) ]
= \sum_{T \in D_5} P(T|spam) = 1.

where we use P(notspam|T) = 1 - P(spam|T).
Similarly, you can show that  \sum_i (A_5)_{i,notspam} = 1.

But now, notice that 

(**) 1 = \sum_i (A_5)_{i,r} > \sum_i A_{i,r}, 

so in (*), both sums are < 1. It follows that 0 <= a <= 1 cannot exist.[]

Now that we know that no Prob measure which equals P can exist for D
finite, consider the infinite case. Notice that the proof does not
use the finite nature of D at all. The finite sums on t can be
replaced by infinite sums, and can be interchanged by Fubini's
theorem, because all terms are nonnegative. All we need is to be able
to enumerate the t's on D and D_5. Moreover, the inequality in (**) 
is strict even in the infinite case (easy, just use a nonreal
document in D_5 with positive P probability). So the same proof works
in the infinite case.

-- 
Laird Breyer.

[Crm114-discuss] Re: continuation: current model in crm114

From: Christian S. <si...@mi...> - 2004-03-28 10:47:42

Hi,

On Sun, 28 Mar 2004, Laird Breyer wrote:
> We're still not discussing the same thing. I agree with the formula
> stated above. This is just a calculation on D_5. The point where
> things break down is when you take the expression P(T is spam | F_k)
> and use it without modification for real documents, ie on the set D.
>
> It is true that the set D is basically a subset of D_5, but P must
> still be converted into some probability on D, which you haven't yet
> addressed. I am claiming that the restriction of P to the set D is
> not suitable for this.
>
> In case this doesn't convince you, let me be more explicit: On D_5,
> you can calculate P(T is spam | T = \phi(t) ), which you've done
> above, for any fixed real document t \in D.

Yes.

> But crm114 only outputs a conditional probability for real
> documents. The set D_5 is an internal aspect of its algorithm,
> but has no other relation to crm114's inputs or outputs.
> As a machine, crm114 works like this:
>
> real documents -> crm114 -> probabilities of real documents
>
> So you are saying that the correct thing to do is to output
> the numerical value computed on D_5, ie you output
>
> Prob( t is spam | t = (t_1,...,t_n) ) =(def)
>                    P( T is spam | T = \phi(t_1,...,t_n) )

Yes. Maybe not "the" correct thing, but I think it's legitimate and
consistent.

> But this is inconsistent on D, because if I feed crm114 every possible
> real document t \in D, then it will output a set of "conditional
> probabilities"
>
> (*) Prob( t is spam | t = (t_1,...,t_n) ) for all possible documents t.

Hm, this would be an infinite set because the length of documents (number
of features) is not limited; the number of possible different features
isn't limited either. But you could certainly do so for any possible
document in t...

> But this family of conditional probabilities is inconsistent.
> There is no measure Prob on D which is compatible with them.
>
> It you don't believe me, I'll give you an outline proof of this
> (just ask). But let's first see if you follow me so far.

Please do so -- I still don't have a clue why it should be inconsistent.

Bye
	Christian

------------ Christian Siefkes -----------------------------------------
|     Email: chr...@si...    |     Web: http://www.siefkes.net/
|  Graduate School in Distributed IS: http://www.wiwi.hu-berlin.de/gkvi/
-------------------- Offline P2P: http://www.leihnetzwerk.de/ ----------
Companies go through periods in which middle and upper managers vie to
out-do each other in thinking up ways to improve near-term performance
(quarterly earnings) by sacrificing the longer term. This is usually
called "bottom-line consciousness," but we prefer to give it another name:
"eating the seed corn."
          -- Tom DeMarco and Timothy Lister, Peopleware

[Crm114-discuss] continuation of current model in the crm114

From: Laird B. <lb...@us...> - 2004-03-28 01:51:59

This is a continuation of the discussion in the crm114-general thread.
This is the reply from Raul Miller:

> On Fri, Mar 26, 2004 at 10:55:49AM +1000, Laird Breyer wrote:
> > CRM114 uses a formula to compute feature probabilities. You are
> > trying
> > to understand the meaning of the formula by slightly modifying the
> > inputs of this formula.  Your modifications include an extra finite
> > document which contains all the possible features once.
>
> Actually, in that context, I'm talking about alternative formula,
> and at times am equating these alternatives with what CRM is doing,
> for illustration purposes.  However, this isn't as important as
> achieving some kind of fundamental agreement on how probability is
> determined:
>

Ok, let's leave this aside for the moment. It's just a distraction.

> You have asserted that the probability that a document is spam based
> on the presence of a word in a document can be completely determined
> for Naive Bayes by knowing how many spam documents the word appears
> in and with no knowledge of how many non-spam documents the word
> appears in.
>
> You are simply wrong about this.

This wasn't my intention. The formulas used contain N_spam[feature] and
N_nonspam[feature]. What is true is that within solely the spam
documents, the counts of "cat" and the counts of "cat sees" must be
consistent (ie the count of "cat" is higher than the count of "cat sees"). 
Similarly, within solely the nonspam documents, the count of "cat" is
higher than the count of "cat sees". Similarly, within both the spam
and nonspam documents together, the count of "cat" is higher than the
count of "cat sees". And finally, within the set of all possible
documents, the count of "cat" is higher than the count of "cat sees".

I've stated this in terms of counts. It's also true in terms of
probabilities. If you have a well defined probability on documents,
then the probability of the feature "cat" must be higher than the
probability of the feature "cat sees", simply because whenever the
feature "cat sees" occurs, then also the feature "cat" occurs.

Perhaps I failed to explain this clearly, but that is all I'm trying
to say. The way I interpreted your replies, you were claiming this
isn't so, and backing up those claims with the crm114 formula. 

I then replied that if the crm114 formula violates the ordering of
feature probabilities (here ordering doesn't refer to word order, it
refers to the fact that prob(cat) >= prob(cat sees) ), then crm114
would be wrong.

>
> > Also, you worry about unknown quantities. In your modified inputs,
> > some of the inputs are called known by you, and some are called unknown,
> > and you are making an effort to work only with known inputs.
>
> This seems to combine at least two completely independent lines of
> reasoning, while losing the point of both.  If you want to respond
> to this aspect, please quote the specific material.
> 
> > You also claim that knowing solely single word probabilities is not
> > enough to predict word order.
> 
> True.

We agree on this, then. 

> 
> > You claim that Naive Bayes can predict word order.
>
> True if word order information is present in a Naive Bayes model.
> False otherwise.

I disagree. I claim that word order information can never be present in a
Naive Bayes model. The fundamental property of Naive Bayes, which is
the property that makes calculating formulas simple, this fundamental
property is the exchangeability of the features. Since some of the
features are words, and words are exchangeable, then word orderings
are never encoded in a Naive Bayes model. 

> 
> > You claim that the formula used by CRM114 is applicable to your
> > modified inputs.
> 
> In some cases.

That may be so. As I've admitted I could not follow your explanation
of the modified inputs, I can't resonably comment. 

-- 
Laird Breyer.

[Crm114-discuss] continuation: current model in crm114

From: Laird B. <lb...@us...> - 2004-03-28 01:13:42

This is a continuation of the discussion in the crm114-general thread.

Previous reply by Christian Siefkes:

> Well, it converts the probability of a document T = (F_1, F_2
> ... F_n) into the probabilities of generating all the _features_ F_k
> (k from 1 to n) in this document.
> 
> P(T is spam | F_k) = P(F_k is in spam) * P(T is spam [the prior
>                     probability]) / P(F_k is in any document)
>
> Where P(F_k is in any document) = P(F_k is in spam) * P(T is spam) +
>                                  P(F_k is in nonspam) * P(T is nonspam)
>
> In the case of CRM, P(F_k is in [non]spam) is calculated by the
> local probability formula given in the Plateau Paper.
>
> We start with uniform priors ( P(T is spam) = P(T is nonspam) = 0.5
> ) and apply this formula for each feature F_k (k from 1 to n) in the
> document T.
> 
> So illegal documents in D_5 do not affect these calculations because
> the probability of generating documents is not used. There is no
> leakage of probability mass.

We're still not discussing the same thing. I agree with the formula
stated above. This is just a calculation on D_5. The point where
things break down is when you take the expression P(T is spam | F_k) 
and use it without modification for real documents, ie on the set D.

It is true that the set D is basically a subset of D_5, but P must
still be converted into some probability on D, which you haven't yet 
addressed. I am claiming that the restriction of P to the set D is 
not suitable for this.

In case this doesn't convince you, let me be more explicit: On D_5,
you can calculate P(T is spam | T = \phi(t) ), which you've done
above, for any fixed real document t \in D. 

But crm114 only outputs a conditional probability for real
documents. The set D_5 is an internal aspect of its algorithm,
but has no other relation to crm114's inputs or outputs.
As a machine, crm114 works like this:

real documents -> crm114 -> probabilities of real documents

So you are saying that the correct thing to do is to output
the numerical value computed on D_5, ie you output

Prob( t is spam | t = (t_1,...,t_n) ) =(def) 
                   P( T is spam | T = \phi(t_1,...,t_n) )

But this is inconsistent on D, because if I feed crm114 every possible
real document t \in D, then it will output a set of "conditional 
probabilities" 

(*) Prob( t is spam | t = (t_1,...,t_n) ) for all possible documents t.

But this family of conditional probabilities is inconsistent. 
There is no measure Prob on D which is compatible with them. 

It you don't believe me, I'll give you an outline proof of this 
(just ask). But let's first see if you follow me so far.


-- 
Laird Breyer.

[Crm114-discuss] Another bug bites the dust...

From: Bill Y. <ws...@me...> - 2004-03-27 23:39:02

I think I've found the "why can't I put 'ADV:'" in the subject line? 
And why does 'ADV' works fine?"  bug.

Consider what happens to the subject lines- they get scoured of punctuation.

That includes colons.

The fix is simple: move the stanza that hacks the subject line:
 
           #     If the user asked for a spam-flagging string, put the flagging
           #     string into the subject.
           #
           {
                match [:spam_flag_subject_string:] /./
                alter (:subj_text:) \
                /:*:spam_flag_subject_string: :*:subj_text:/
           }

from directly before the "de-fang the subject" block to directly after
the de-fang block.

It seems to work so far...

   -Bill Yerazunis

[Crm114-discuss] New version (not seekrit!) on the website.

From: Bill Y. <ws...@me...> - 2004-03-27 17:17:24

New version's up.  It's bleeding edge, but fixes the WINDOW problems and
a few others.

It's on the web page:

     http://crm114.sourceforge.net

Here's the update:

   This is the new bleeding edge release.  A complete rewrite of
   the WINDOW code has been done (byline and eofends are gone, eofretry 
   and eofaccepts are in), we're integrating with TRE 0.6.6 now,
   and a bunch of bugs have been stomped.  

   For those poor victims who have mailreader pipelines that alter headers,
   you can now put "--force" on the BASH line or "force" on the
   mailer command line, e.g. you can now say

      command mysecretpassword spam force

   to force learning when CRM114 thinks it doesn't need to learn.

   However, this code -is- poorly tested yet.  Caution is advised.  
   There's still a known memory leak if you reassign (via MATCH) a 
   variable that was isolated; the short-term fix is to MATCH with 
   another var and then ALTER the isolated copy.

The md5sums:

a023a3217b663a0badecf8899b08a427  crm114-20040327-BlameStPatrick.css.tar.gz
cff5bb6fd23ae7e3fd90f8ab31fbb08f  crm114-20040327-BlameStPatrick.i386.tar.gz
dfaaf4d9a856b585c32d1df04006cb18  crm114-20040327-BlameStPatrick.src.tar.gz

				  -Bill Yerazunis

Flat | Threaded

<< < 1 .. 3 4 5 (Page 5 of 5)