You can subscribe to this list here.
| 2002 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|---|
|
From: Robertson, I. <ia...@ka...> - 2002-01-28 00:46:57
|
> Yep. I'm thinking that a lot of looking at the body will need to be done. > Though that same client abuses the message body as well. Does it ever! Of course, since I'm using it at the moment, I can't really talk... I have a book on the various email RFC's and "standards"; the most disparaging words are said about Outlook/OE. I'll see if there are any suggestions in there on how to link/thread messages without relying on any *one* piece of information within the email. Iain |
|
From: Anthony D. <as...@su...> - 2002-01-27 19:23:40
|
Robertson, Iain writes: >> Famous last words. It turns out very hard to do with email, >> since you don't have a nice References: header like you do in >> news. > > If I'm not mistaken, however, most readers use the In-Reply-To field > properly, meaning that you have a (hopefully!) good and reliable way of > linking messages. When supported. When not conflicted by the References: header in the same message. When the message id was unique, and provided. > Agreed; good luck (since I can think of at least one > commonly used security hole disguised as a mail client which abuses the > In-Reply-To field along with almost every other field in the header). Yep. I'm thinking that a lot of looking at the body will need to be done. Though that same client abuses the message body as well. |
|
From: Robertson, I. <ia...@ka...> - 2002-01-27 13:02:42
|
> Famous last words. It turns out very hard to do with email, > since you don't have a nice References: header like you do in > news. If I'm not mistaken, however, most readers use the In-Reply-To field properly, meaning that you have a (hopefully!) good and reliable way of linking messages. Agreed; good luck (since I can think of at least one commonly used security hole disguised as a mail client which abuses the In-Reply-To field along with almost every other field in the header). > Sure... but I don't see how that could be done, at least not in > the near-term. I'd have to agree with that, at least for the moment. I'm more interested in this from a, well, interest point of view at the moment - if something comes of it later, then so be it. Iain |
|
From: Anthony D. <as...@su...> - 2002-01-27 09:48:03
|
On Tuesday, January 22, 2002, at 05:33 PM, Adrian Sutton wrote: > I would strongly suggest XML for this as it's precisely what it > was designed > for. XML sounds good. And the minor overheads of XML will be nothing compared to the major CPU eaten by all the AI work. > It should be an option to require approval for categorisation > or not and it > should always be easy to override and change a categorisation. This suggests something like storing our results to a database like PostgreSQL instead of flat files --- easier to change things like that. > >> I strongly suggest that Adrian Sutton write the "Topics of Discussion" >> log writer ;-) > > Hmm, I think I just got given half the work here ... only half? >> The "Members" one should probably just look for unique >> names and/or email addresses, and offer the poster an opportunity to >> describe himself. It could also link to all the messages in >> the log from >> that member. [Yes, that unfortunately means our plugins aren't >> completely independent. Oh well.] > > The plugins can actually be independent and still achieve this. No --- I meant it very pedantically. If plugin A reads output from plugin B, then plugins A and B aren't (strictly) independent anymore. > Computers are useless. They can only give you answers. > --Pablo Picasso "fortune" might help ;-) |
|
From: Adrian S. <ajs...@op...> - 2002-01-22 22:34:06
|
> First off, Adrian, can you add me to the sf.net group? Done. > This, I dare say, is a more abitious task than any of the interpreter > stuff I've played with. I'd suggest we break it down into more > manageable parts --- divide and conquer. Any comments on the below? Agreed, it is a very ambitious project which is why I like it so much. :) > First, before we can do anything, we need to know how we're going to > structure the entire thing. I particularly like passing things via text > when possible because it allows you to do neat things like use different > languages for different modules, distribute work over different > machines, etc. I would strongly suggest XML for this as it's precisely what it was designed for. We should create a general schema to define the layout (if possible) and generally standardise on what we call things as much as possible though. The entire system needs to be easily extensible by adding new agents. > I'll assume we get an email message as input. That'll need to be > classified into one of the Event Types. My first try at that would be to > have numerous plugins there, each returning a classification and a > confidence score. The classifier would use these confidences to > determine which one is right. But it wouldn't be highest value wins; for > example, a near tie between a press release and topic of discussion > would go to the topic of discussion, because they are more common. The > confidence would only weight the decision, along with a probability > table, and possibly human checking of past behavior. Or, the human > checking would be by adjusting the probability table. Sounds good. Each email processing agent is capable of calculating a rating of how likely it is to process the email correctly and another agent would be capable of comparing the results, checking the probability table and deciding which agent to use (a governor agent). > Now that we have (hopefully) correct guess as to what type of message > this is, we should do some sanity checks. Some categories should > probably, at least at first, require human input. We don't want to be > announcing non-existant votes, making announcements of spam sent to the > list, etc. It should be an option to require approval for categorisation or not and it should always be easy to override and change a categorisation. > Now that we're sure about the message and classification (it has been > checked by a FreeCard group member, after all), we should then try and > add it to the log. Most categories just involve appending the message, > and those can be done with a pretty boring plugin. There are two that > are of special interest, the "Topics of Discussion" and "Members." (Note > that "Notable Mailings" is not of special interest here; it was dealt > with by the classifier). Agreed. > I strongly suggest that Adrian Sutton write the "Topics of Discussion" > log writer ;-) Hmm, I think I just got given half the work here but it should be an interesting task. My natural language processing skills are somewhat dulled to what they used to be so I look forward to improving them again. > The "Members" one should probably just look for unique > names and/or email addresses, and offer the poster an opportunity to > describe himself. It could also link to all the messages in the log from > that member. [Yes, that unfortunately means our plugins aren't > completely independent. Oh well.] The plugins can actually be independent and still achieve this. The work I'm currently doing for the Software Quality Institute is in exactly this area. The secret is to structure the system into a bunch of components with each having a state, a set of inputs and outputs and behaviours. The behaviours determine what the component will do when certain types of data is received on a particular input - this either changes the state of the component or produces an output. Output could include outputting data into a "magical tube" that delivers it to the appropriate components or it could be writing to a file, display output for the user etc. The "magical tube" is what I have termed a DataQueue and is an established link between any number of outputs and any number of inputs. These links are established when the system is activated but can be dynamically changed, for instance to add a new agent to the system while running. So in our system each agent would be a component. So we'd have an email message retrieved by the email checker component (design off the top of my head, some of these components may not be necessary). The email checker then outputs the message and it is received by each of the recognisers which each output a match rating to the classifier. The classifier takes all the probabilities, combines them with the probability table and whatever other heuristics it has and then outputs the message, class of the message and the probability rating that this is correct. This is received by the sanity checker..... And so on. You'll note that this is almost exactly what Anthony depicts in his diagram (quoted below). > A diagram follows: > > probability table FreeCard group member > | | > | {prob. of classes} | class > | | > email message \|/ {msg, class, prob} \|/ > ---------------> CLASSIFIER --------------------> SANITY CHECKER > | | | | > | | | | {msg, class} > R R R \|/ > E E E CLASSFULL DISTRIBUTER > C C C / | \ > O O O / | \ > G G G /m |m \m > N N N /s |s \s > I I I /g |g \g > Z Z Z / | \ > E E E / | \ > R R R LOG WRITER 1 LOG WRITER 2 LOG WRITER n > 1 2 n I am currently working on a tool which automates the design of the system from requirements and this looks like an excellent test case for it so I'll take a look at it tonight. The tool, btw, takes an informal description of the system requirements rather than formal descriptions. Anyway, I should go do some more work on the said tool and get it printing properly. :) This looks like this project will not just be interesting, I might even get paid for part of it. :) On a side note, there has been a lot of work done on translating from requirements to behaviour trees (as the tool I'm working on does now) but also from behaviour trees on to code. Almost all of the logical, decision making code can be generated automatically. In this case it would most likely be the component outlines and DataQueue managing code that could be written because designing natural language processing in behaviour trees would be somewhat tedious. Adrian Sutton ************************************************************** Ph: 3411 4361 Mob: 04 2223 6329 Computers are useless. They can only give you answers. --Pablo Picasso ************************************************************** |