Ger Hobbelt wrote:
> On Thu, May 22, 2008 at 10:08 PM, Trever L. Adams
>> Is it possible to have CRM do mailreaver.cm like functionality in a loop or
>> will it only do one such thing per invocation? I ask this because I have an
>> idea on how to make CRM act like a daemon if it does. (Loop over reading
>> data from a program that does the TCP/IP for it... maybe the constant
>> scheduling is more expensive than doing pipes directly with the program that
>> would have been the TCP/IP sending.)
> Phui! <scratches head like baboon>
> Don't know for sure. There's nothing requiring crm114 ever quits
> executing it's script, so you might try to daemonize it that way from
> the inside out (maybe using crm script syscalls to get it to wait for
> input - but that's a thought just now and it worries me already I
> thought of it like that...). Of course, long-running scripts should
> use 'syscall /sleep/' alike somethings to keep the OS happy as you'd
> otherwise achieve 100% load, but it's doable, sure.
are you sure it's your head you're scratching? :-)
As I saw Bill described in another message, that within CRM 114, you can spew
off millions of offspring with only a few milliseconds per call, I know from
practical experience that it takes much longer if you are doing a fork and exec
versus just the fork (spawning from CRM 114, more crm114). But that's not the
real problem with spawning CRM 114. I've measured CRM 114 execution times as
long as a few seconds when scoring. More often than not on a lightly loaded
system it's a big fraction of a second (more hard data in the next day or two)
but as system load increases and demands on memory stress the system, that time
shoots rapidly towards the sky.
So the proper way to think of interfacing with CRM 114 (in my humble opinion) is
to connect to it via some form of queuing mechanism. Now given talking with CRM
114 is a conversation rather than just an info dump, you want your queue entry
be capable of pushing and returning data. At the same time you don't want to
single thread connections but instead allow for as many connections as system
Failure case: in twopenny blue, The model I used was let all of the scoring
processes run in parallel and single thread training processes. Unfortunately,
because of how postfix operates, I would end up with as many as 15 or 20 scoring
processes either running or waiting on a training process. If you are hit with
lots of messages in the "score this" range when they really shouldn't be there
in the first place, you end up blocking inbound on a very regular basis. In any
case, since the postfix process filter lets was a dumb fixed limit, the load
average would spike and then things will get worse. It took a very long time to
untangle everything from the number of CRM 114 running processes to queue
timeouts on postfix. It was a mess.
How would I do things differently today? I would put in a request queue with a
smart limiter. For example, training queues can wait for real idle time.
Scoring queues should be processed as quickly as possible to minimize e-mail
delivery delays. So for example, a smart queue would hand off scoring requests
in preference to training and the queue read should block if resources are at a
you have a choice of how to implement the consumer of the queue. You can either
use resources such as the process table or semaphores if you have only one
instance of CRM 114 per filter invocation or you can use a demon of some sort
and send messages telling it what to filter.
but this brings us to the more difficult portion of using CRM 114 in a e-mail
environment. It works great as a one person at a time filter. No disputing
that. However, if you try to filter for hundreds or thousands of people, you're
looking at a pretty significant load because at 15 messages per second, you're
looking at 15 forks, 15 execs, 30 memory mapped file requests, and God knows
however much just-in-time compiling on every single invocation. I know from
practical experience that this is not fast especially if all 15 of those
messages are running CRM 114 at the same time, each with a different user. You
get very little benefit from the potentially shared memory space offered by mmap.
(As a brief aside, this is another good reason for queuing requests. If you
make a smart queue, in theory, you can aggregate requests per user and make the
system not work so hard. I suspect in practice, the time between messages is
not short enough to gain any advantage and try to aggregate on a per user basis.
Although, it's probably worth measuring just to make sure.)
so, in a multiuser environment, we have the worst possible use case for CRM 114.
Everything runs, everything recompiles, mapping in and out files and getting
no reuse. What's a code monkey to do?
We could demonize CRM 114 and that raises a whole bunch of other questions. How
do you communicate with it? How does it get the body of the message? How does
it differentiate between users? Will it really make things better or is the
implementation model just wrong?
Demonizing CRM 114 will fix the forking problem.
Communications should be fairly small (here's the message in a file, here are
where the user specific data files sit, what's the answer?)
Can just in time results encapsulated so that each user has their own
compilation results in a local cache?
But demonizing will not fix the flailing css file problem. The only way to do
that is to have a systemwide aggregate css file set. I'm not saying train
everything into one file for all users but that one file holds multiple css
files with associated indices. Yeah, we're talking a whonking huge dbm file
that can be expanded (online) and shrunk (off-line). Don't know well it would
work in the real world but, at first glance, it seems like it might cut down the
another popular alternative is libraries but again, it has almost all the same
failings of current CRM 114 and only eliminates the forking problem.
so, I guess this is a long way of saying that this is as good as it gets.
Because if we thought it would do any better we would've done it. I know I need
to work on the load-based request limits for twopenny blue but that's using CRM
114 in a way most people don't.