You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(21) |
Nov
(9) |
Dec
(13) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(9) |
Feb
|
Mar
|
Apr
(6) |
May
(13) |
Jun
|
Jul
(13) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
|
|
From: James S. <jl...@jl...> - 2007-10-25 16:38:02
|
Maildb is dead, long live maildb...er, gmail...whatever... JLS |
|
From: Darrell K. <ap...@kr...> - 2007-10-25 16:23:24
|
Damn it. Should have sold to G. Fare well, mail-db. You will be missed. On Oct 25, 2007, at 9:08 AM, Liza Weissler wrote: > Funny I was just thinking about this project the other day and how > for me it was superceded by using gmail...so I think my response is > "works for me". :-) > > - Liza > > On 10/25/07, Jeff Squyres <jsq...@os...> wrote: > Given that there has been zero progress on this SF project for years, > and given that Gmail now supports IMAP, I think all the ideas of > maildb have "been done." Gmail isn't an open source implementation, > but that doesn't matter to me anymore (meaning: I certainly don't have > the cycles to do this stuff myself). I'm very glad that others have > implemented these ideas; I think that e-mail clients will benefit > greatly (gmail is great; others are copying the ideas to other > clients). > > In particular, look at Gmail's mapping of IMAP actions: > > http://mail.google.com/support/bin/answer.py?answer=77657 > > So, unless someone else wants to take over this project, I think it's > time to officially declare this SF project "dead." > > -- > {+} Jeff Squyres > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a > browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > maildb-devel mailing list > mai...@li... > https://lists.sourceforge.net/lists/listinfo/maildb-devel > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a > browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > maildb-devel mailing list > mai...@li... > https://lists.sourceforge.net/lists/listinfo/maildb-devel |
|
From: Liza W. <lyw...@gm...> - 2007-10-25 16:08:41
|
Funny I was just thinking about this project the other day and how for me it was superceded by using gmail...so I think my response is "works for me". :-) - Liza On 10/25/07, Jeff Squyres <jsq...@os...> wrote: > > Given that there has been zero progress on this SF project for years, > and given that Gmail now supports IMAP, I think all the ideas of > maildb have "been done." Gmail isn't an open source implementation, > but that doesn't matter to me anymore (meaning: I certainly don't have > the cycles to do this stuff myself). I'm very glad that others have > implemented these ideas; I think that e-mail clients will benefit > greatly (gmail is great; others are copying the ideas to other > clients). > > In particular, look at Gmail's mapping of IMAP actions: > > http://mail.google.com/support/bin/answer.py?answer=77657 > > So, unless someone else wants to take over this project, I think it's > time to officially declare this SF project "dead." > > -- > {+} Jeff Squyres > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > maildb-devel mailing list > mai...@li... > https://lists.sourceforge.net/lists/listinfo/maildb-devel > |
|
From: Jeff S. <jsq...@os...> - 2007-10-25 15:12:12
|
Given that there has been zero progress on this SF project for years,
and given that Gmail now supports IMAP, I think all the ideas of
maildb have "been done." Gmail isn't an open source implementation,
but that doesn't matter to me anymore (meaning: I certainly don't have
the cycles to do this stuff myself). I'm very glad that others have
implemented these ideas; I think that e-mail clients will benefit
greatly (gmail is great; others are copying the ideas to other
clients).
In particular, look at Gmail's mapping of IMAP actions:
http://mail.google.com/support/bin/answer.py?answer=77657
So, unless someone else wants to take over this project, I think it's
time to officially declare this SF project "dead."
--
{+} Jeff Squyres
|
|
From: Jeff S. <jsq...@os...> - 2004-05-30 12:43:05
|
I finally had a free window this past week and spent some time
working on maildb (!). Woo hoo!
Specifically, I wrote a perl script to import mbox archives into the
maildb database (well, actually, I used the CPAN module Mail::Box
which natively handles lots of kinds of mail archives -- not just
mbox). I wrote this script to understand the database schema as we
had in CVS and to take it into the proof-of-concept realm.
It seems to work. I imported over 53K messages from my current mbox
archives (damn, I get a lot of mail) into a maildb MySQL database.
Woot!
I then wrote another script to look up categories and messages in
those categories -- i.e., display the messages that had been imported.
After finding some bugs in MySQL (!) and Mail::Box (more on these
later), that script now also seems to work. Double woot!
There is still much work to be done. I'm not much of a database guy,
and it's a little hard for me to think along those lines -- I'm sure
that my queries and indices can be optimized (doing the import of 53K
messages takes many hours on a reasonably fast Linux x86
box)... which I'll let you other DBA-types argue about. :-) Also,
nothing has been done on the IMAP-server side -- this past week was
just spent understanding the DB schema and trying to do some
practical stuff with it.
-----
During this process, I either exposed weaknesses in the design that
are inevitable during a first implementation of a design, or I didn't
fully understand Liza's original intent (which is quite probable).
I've committed changes to the MySQL schema that Liza proposed --
mysql/libmaildb/db/mysql/doc/cr_maildb.sql. If I'm totally off-base
and simply misunderstood the original intent, we can always roll back
CVS to the original stuff.
Rather than try to explain all the changes, let me explain the schema
philosophy that I've committed:
- During my work, an epiphany came to me: we really don't need to
*interpret* much of the data that we're storing. We really only
need to *store* and *retrieve* it. For example, we had a scheme to
normalize MIME types. This is good for space saving, but I ditched
it in favor of just storing message header data that is transparent
to maildb. Specifically, all we have to do is store a set of
message headers and then be able to output them upon request -- we
don't have to know what any of them *mean*.
That being said, there are good reasons for normalization (e.g.,
space savings). And we might still want to do that -- but let's get
it working first, and then go back to that (e.g., selectively
interpret some of the header lines, such as the MIME type, and
therefore be able to normalize them).
- I think we had also thought of saving the entirety of the original
message in a separate table (headers and all). Does anyone remember
why we were thinking of doing this? Was it just for debugging? I
ditched this table as well; it seemed to simply double the storage
space required.
- Terminology: Ignoring a lot of details -- a RFC 2822 message is
comprised of a header and a body. The body may be plain text or one
or more "parts." Each part may or may not have its own sub-header,
and may actually be another RFC 822 / 2822 message itself. So it's
really a recursive thing -- a message will have one or more body
parts, each of which may be another message in itself.
- Keep in mind that some of the stuff described below is because it
was the way we originally designed it (2+ years ago!). I don't
remember all the reasons for what we did -- and I actually question
at least some of it -- but I stuck with most of the original
decisions.
- Here's a breakdown of the major tables:
- users: a simple maildb-UID to username mapping. The maildb-UID is
a maildb-specific UID used for establishing the ownership of
messages in the database. It's referenced in most of the other
tables. Remember -- we don't want to implement an authentication
scheme (that's a job for other tools); we only need a simple
username-to-UID mapping.
- cats: mapping of category names to category IDs, including the
concepts of user ownership and hierarchical organization of
categories (i.e., nested categories, like filesystem directories).
- messages: every message (including embedded RFC822 messages) has
exactly one entry in the message table, giving it a unique ID.
This message ID value is extensively cross-referenced in other
tables to bind header and body data to a single message. Messages
are [currently] owned by a single UID, and have flags that, among
other things, indicate whether the record is a valid message or
not (e.g., partially inserted messages will have their "valid"
flag set to 0).
- msg_cats: A message will have a msg_cats record for every category
that it is in. Hence, it's mainly a cross reference between
message ID's and category ID's.
- msg_hdrs: A series of key=value records of header lines from any
part in a single message (remember that body parts can have header
lines). Header lines are attached to a specific part in a
specific message (e.g., part=0 means the main header). The
ordering of the header lines is, of course, maintained.
- msg_parts: Each message has at least one body part. Each record
in this table is tied to a specific message, and has an ordered
part ID (i.e., all parts, in order, are the "body" of the
message). Each part will either be stored in the record itself
(as a mediumtext BLOB) if it's under a specific size, or will be
stored in the filesystem if it's over that size.
- msg_quick_search: this is the one table where we actually
interpret several of the "common" fields in the RFC 2822 header
(to, cc, bcc, from, subject, date, etc.). We store them all in
text blobs for quick searching. The entire point of this table is
for quick searching that resolves down to a message ID where we
can actually get to the real message.
- config: a simple key=value table where maildb configuration can be
stored. For example, the max length (in bytes) of messages that
will be stored in the DB is in this table. I anticipate that
we'll eventually have lots of tunable maildb parameters in here.
Users can put their own overrides in here (where it makes sense),
so there's a UID field as well (UID=0 are system-level config
options).
So here's how a message is inserted:
------------------------------------
1. A record is created in messages so that the message ID is
created. The "valid" flag is set to 0 upon its creation, so any
other threads/agents looking at the database won't think that this
is a message that can be read.
2. For each part (to include the main header):
2a. The "quick search" record is inserted, cross referenced to the
message ID.
2b. If headers exist for this part, they are insertted in the
msg_hdrs table, cross referenced to the message ID and part ID.
2c. The body part is either stored in the msg_parts table or in the
filesystem; either way, a new entry is insertted in the
msg_parts table and is cross referenced to the message ID and
part ID.
2d. A record is created in msg_cats tying the new message ID to a
category ID.
2e. The "valid" flag on the messages record is changed to "1",
indicating that this is now a valid message that can be read.
Scripts that I wrote:
---------------------
Both scripts are located in libmaildb/db/mysql/doc (we can change the
directory structure later).
Both of these scripts assume that you've followed the instructions in
libmaildb/db/mysql/doc/README to create the maildb MySQL database and
all of its tables. You'll also need to create /var/spool/maildb and
give it the same permissions as /tmp (777 and chmod +t; I forget what
t is offhand :-). This directory is where long messages are stored.
- import_mbox.pl: Takes argv listing mbox files to import. Ensures
that your unix username is in the users table. Ensures that you
have a special INBOX category. Sets some default config values in
the config table if they aren't already set. Each mbox file is then
read and parsed; messages are inserted in a category name matching
the filename of the mbox file being imported (please only use
forward relative filenames -- I didn't put any logic in for absolute
directories or "." or ".."). For example:
./import_mbox.pl foo bar/baz bar/moog/cow
will import the messages in 3 mbox files, and make the following
categories along the way:
foo
bar
baz, child of bar (i.e., "bar/baz")
moog, child of bar (i.e., "bar/moog")
cow, child of moog (i.e., "bar/moog/cow")
- index_cat.pl: for a given category, show all of its sub-categories,
list the number of messages in that category, and display the
headers of all the messages.
These are both works-in-progress; they'll probably change a bit more
over this weekend (e.g., showing the bodies of the messages in
index_cat.pl is a trivial addition).
Some DB issues:
---------------
- I'm not sure we understand what we need for indexes. Indexes are
easy to add/modify, so I'm not worried about it now. But when we're
done with most of the design/coding, we should probably look up the
searches that we're always doing and make indices to support those.
- Another minor point -- mbox archives of my 53K messages occupy
approximately 530MB of disk space. After the import of these 53K
messages, the resulting MySQL DB used 881MB of disk space. I
suspect that at least some of this is because of the oodles of
indexes that we're creating now. This is not a huge deal, but we
should try to not take up *too* much more space than is really
necessary...
- I don't have an exact number, but importing the 53K messages took
something like 10+ hours on a reasonably fast Linux box. More
specifically, the further into the import it got, the slower it
became. I suspect that this has something to do with the indexes we
currently have, and it may simply be the nature of the beast. But
it's something we should look at optimizing (average time to insert
a single message is going to be a critical performance factor in the
long run).
- Because we're storing long messages in the filesystem (and not in
the database), we can't do full-text searches on messages. I know
we talked about this at least a little bit, but I don't remember why
we decided to do this instead of either allowing a bigger BLOB
and/or splitting the message part across multiple records (which
would seem necessary, regardless of the max part size that we have,
unless we outright reject messages that are too large). Does anyone
remember?
Bugs in other software:
-----------------------
- I was working with the perl CPAN module Mail::Box v2.055. There are
two bugs in Mail/Message/Head/Complete.pm where you can get warnings
at run-time in perl about uninitialized variables used with the ">"
operator. These are fairly harmless for our purposes; I've mailed
the Mail::Box author about them. You can ignore them.
- Mail::Box was relatively senative to improperly-formatted messages.
It rejected a few messages that had malformed addresses in to CC
line, had incorrect MIME separator lines, etc. This is not a
problem for maildb itself (i.e., this has no bearing on our actual
run-time -- remember, we odn't need to *interpret* the data that we
store) -- it's just an issue for the importer script that I wrote.
I think it rejected something like 10 messages out of the 53K that I
imported. This definitely falls within the bounds of "good enough
for prototyping." :-)
- MySQL v4.0.17 (the default for fink on OSX) seems to have a bug with
inserting indexed text field values that have spaces at the end of
them (or spaces at the end of the indexed portion). This does *not*
happen in v4.0.15 nor 4.0.20). Specifically, here's a case that
will trip the bug:
-----
create table bogus ( subject text, index subject_index(subject(16)) );
insert into bogus values ("hello");
insert into bogus values ("hello"); # this works fine
insert into bogus values ("hello ");
insert into bogus values ("hello "); # this will barf, complaining of
# a duplicate key
-----
Clearly, this should not happen. I put a workaround in the perl
scripts that I wrote to ensure that there is never any whitespace at
the end of an imported field. But we shouldn't need to do that.
- Although the MySQL mediumtext BLOB allows values up to 16M in
length, the client and/or server is only configured to allow
max_allow_packet (a MySQL parameter) bytes to be sent between the
client and server in a single query. This value defaults to 1M for
the server on my OSX laptop (and I think on all systems...?).
Hence, the upper bound for mediumtext is effectively 1M unless you
increase the max_allow_packet value. But 1M is probably ok -- in my
~53K imported messages, I had 65 only parts that were >1MB.
(you can easily change the value on the MySQL server -- supply a
parameter to mysqld_safe when you start it).
Note that the 1M rule applies to the entire insertion SQL string
sent to import a *part* into the database -- not to the entire
*message*. Hence, there's roughly a 1MB limit on each *part* of a
message.
Open questions:
---------------
- How to do deletions? I *think* know the answer to this one, but it
still requires a little more thought (haven't done any prototyping
code yet). MySQL doesn't have trigger procedures, so the
possibility of a race condition in a multi-threaded server, or a
server allowing multiple simultaneous user connections (like UW
IMAP) is real -- need to think about this a little more. Current
thought is that when a message is removed from a category, do
another search to see if it's referenced in *any* category. If it's
not, then delete it (this is effectively reference counting). Any
other opinions here?
- Should multiple users be able to own the same message? This implies
-- at the very least -- separating the UID out of the messages
table (and probably some other minor re-organization). This would
seem nice for when a 20MB e-mail is sent to 500 users on the same
server -- only one copy of the message needs to exist, and it's just
"owned" by multiple users. When all users delete it, it actually
gets deleted (i.e., reference counting, in some form).
- Is the msg_quick_search table worth it? It duplicates much of the
data in the msg_headers table, and probably causes a lot of space
to be used in indexes. Can we effect the same searches in msg_hdrs
without this table?
Work still to be done:
----------------------
- Look at the UW IMAP docs and see what actions it requires, how a
maildb device should be designed, etc.
- Postgres version of the same stuff that I've extended from Liza's
work in MySQL.
- "Views" (stored searches).
- Create and maintain logs.
- Had an interesting idea about pre-defined views -- we should
probably offer a set of time-based pre-defined views (e.g.,
"yesterday", "within last week", "within last month", etc.). And we
should offer these views as a sub-view of any view and category. So
you can see "yesterday" mails in the "foo" category, for example.
Could be handy.
- Expand the set of configuration options, and allow users to have
their own overrides (where it makes sense). Hence, I added a UID
field to the config table (UID=0 means system values).
...and probably a lot more that I'm not thinking of right now. :-)
That's it!
----------
Comments appreciated on any of the above!
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Post Doctoral Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: Jeff S. <jsq...@os...> - 2004-05-12 12:37:21
|
Looks like someone else has finally taken up the idea of using what we
called categories -- Google's Gmail uses something called "labels" that
looks almost exactly what we were thinking of. Check out this review (who
cares about the review -- you can see the features that Gmail is going to
have):
http://www.extremetech.com/article2/0,1558,1586090,00.asp
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: Jeff S. <jsq...@os...> - 2003-07-09 20:58:50
|
On Wed, 9 Jul 2003, Jeff Squyres wrote:
> -----
> :0 fc
> * ^TO_...@li...
> | /usr/local/bin/maildb.insert --category mailb/devel
>
> :0 fc
> * ^FROM_.*@squyres.com
> | /usr/local/bin/maildb.insert --category received/squyres/family
> -----
I forgot to mention a critical point here -- the user has no concept of
what "maildb.insert --category abc" actually *does*. We can implement it
however we want. So if that means add an X-Maildb-Category header line,
or whether that menas frobbing the DB -- we can do whatever we want
(include totally change how it works, as long as the end result is the
same) and the user interface stays the same (i.e., we don't break any
procmail rules).
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: Jeff S. <jsq...@os...> - 2003-07-09 20:55:43
|
On Wed, 9 Jul 2003, Darrell Kresge wrote:
> Even if only temporary, I see using procmail as choosing a sledgehammer
> to hang a picture (not that I'm necessarily opposed to such things ;-) )
>
> Since you've already a requirement to parse/extract header information,
> why not just implement regex -> folder filters directly?
Not sure what you mean here...?
> 1) I realize that configuration of the rule file will be an issue, but
> no worse so than dealing w/ .procmailrc.
Yes and no. I mentioned procmail because it's well known/loved/trusted,
and it would be good to be able to support it (in some way). This would
give us the leverage to have flexible filtering even in 1.0 (when we don't
have native/internal filtering).
That being said, I just thought of a problem with my proposed approach --
see below.
> 2) Provided the interface to the "pattern select/route" mechanism is
> well defined, the regex stuff could easily be replaced downstream with
> something more powerful.
Agreed -- some kind of generalized mechanism would be good.
> 3) Using an X-Header to determine routing inside the maildb proper is
> going to end up being a hack on top of a hack -- you'll end up needing
> to eliminate it later when you decide to do the Right Thing
Possibly. But it could be good to be able to support *both* procmail
*and* native filtering (if, perhaps, on the back end, they actually end up
doing the same thing -- then it wouldn't be nasty. i.e., separate the
decision-making process from the acting-on-the-decision process).
But I did just think of a problem with the X-Maildb-Category approach:
what it someone sends you a message with:
X-Maildb-Category: inbox
That is -- anyone can force a message to go into any of your categories
simply by adding header lines to messages that they send to you. And
that's clearly not a Good Thing. :-)
So back to what I said above -- perhaps we could do a "do no harm"
approach in a .procmailrc, where instead of adding a header line, you
actually run some maildb executable that adds the message to that
category (this may get a little complicated, but bear with me for this
thought experiment...). So instead of:
-----
:0 fc
* ^TO_...@li...
| formail -A "X-Maildb-Category: maildb/devel"
:0 fc
* ^FROM_.*@squyres.com
| formail -A "X-Maildb-Category: received/squyres/family"
-----
Instead, you'd have:
-----
:0 fc
* ^TO_...@li...
| /usr/local/bin/maildb.insert --category mailb/devel
:0 fc
* ^FROM_.*@squyres.com
| /usr/local/bin/maildb.insert --category received/squyres/family
-----
...and so on.
The real trick/complication would be for a message that matches multiple
rules; that maildb.insert (or whatever) will have to recognize that it's
the same message and simply add another category to the message that's
already in the db. Since we can't rely on the Message-Id, this is the
part that I don't really know how to do... :-(
It seems that these procmail rules would need to put in some kind of
forward reference saying "there's an incoming message coming, make sure
that it gets added to category ABC" (but don't forget that
procmail/mail.local/etc. can be run asynchronously, so 2 different
messages with the same Message-ID can come in and be processed
"simultaneously. So it still reduces to the same problem as above).
And perhaps procmail isn't the thing we want to support. But any
rules-based agent will follow the same general principles. So this is
probably still worth discussing...
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: Jeff S. <jsq...@os...> - 2003-07-09 20:24:57
|
On Wed, 9 Jul 2003, Darrell Kresge wrote:
> > That's one way to do it. Another way that we used at A Former Company
> > of Mine (Collective Technologies, Austin TX) was to do a simple
> > encryption of the database username/password into a "keyring", and our
> > perl subroutine that handled the database connection extracted/used
> > that information. It was more obfuscated than it was secure ... but
> > we figured every little bit helped. :-)
>
> [snipped]
> Presumably, you're not going to be writing implementations for each and
> every potential database that someone might use. Additionally,
> different DB vendors will use different authentication strategies.
> Assuming that the DB shim is developed externally (using Yet Another
> Well Defined Interface (soap, odbc, sql), it seems that for the purposes
> of release you'd want to keep the underlying mechanism as simple as
> possible; both functional and tutorial. To that end, I would think that
> even an environment variable in a root owned start script would be
> sufficient. Sure it's ugly, but it's easy to understand.
I think the central issue is that the mail server process has to be able
to access a secret somehow. If you need one secret to get to another,
then that really doesn't solve the problem -- that the maild server
process (or proxy that continually gets launched via mail.local or
whatever) needs to be able to connect in an automated fashion.
And since we're not trying to protect from root -- we're only trying to
protect from other users -- a 0400 file seems like a nice, simple solution
(and easy to debug/maintain).
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: Darrell K. <dk...@ya...> - 2003-07-09 17:44:20
|
li...@av... wrote: > >> My question is: how do we authenticate to the database? >> ... >> Do we just put a 0400 file somewhere on the local filesystem that only >> root and the mail.local user (probably "mail" or "daemon" or ...?) can >> read that contains th DB username and password? The only other way >> that I >> can think of would be to compile the DB username/pw in the mail.local >> executable, but that might make it vulnerable to "strings >> mail.local", or >> something along those lines. Is there a standard way to do this kind of >> thing? We're not trying to protect from root in this case -- we're only >> trying to protect from other users (right?) -- so I'm thinking that a >> 0400 >> file might not be totally evil (one way to think of it: it's no less >> secure than 0600 /var/spool/mail/* mbox files). > > > That's one way to do it. Another way that we used at A Former Company > of Mine (Collective Technologies, Austin TX) was to do a simple > encryption of the database username/password into a "keyring", and our > perl subroutine that handled the database connection extracted/used > that information. It was more obfuscated than it was secure ... but > we figured every little bit helped. :-) > - Liza > > ----- I really like that idea -- and there's no reason that the encryption would need to be simple -- it could be PKI -- when you start the daemon, you specify a passphrase to get your private key which can decrypt the passwords on the publicly encrypted ring. But... Unlike filtering, is this really a maildb issue? Presumably, you're not going to be writing implementations for each and every potential database that someone might use. Additionally, different DB vendors will use different authentication strategies. Assuming that the DB shim is developed externally (using Yet Another Well Defined Interface (soap, odbc, sql), it seems that for the purposes of release you'd want to keep the underlying mechanism as simple as possible; both functional and tutorial. To that end, I would think that even an environment variable in a root owned start script would be sufficient. Sure it's ugly, but it's easy to understand. -D |
|
From: Darrell K. <rep...@kr...> - 2003-07-09 17:22:29
|
Even if only temporary, I see using procmail as choosing a sledgehammer to hang a picture (not that I'm necessarily opposed to such things ;-) ) Since you've already a requirement to parse/extract header information, why not just implement regex -> folder filters directly? 1) I realize that configuration of the rule file will be an issue, but no worse so than dealing w/ .procmailrc. 2) Provided the interface to the "pattern select/route" mechanism is well defined, the regex stuff could easily be replaced downstream with something more powerful. 3) Using an X-Header to determine routing inside the maildb proper is going to end up being a hack on top of a hack -- you'll end up needing to eliminate it later when you decide to do the Right Thing Just my $0.02 USD -D Jeff Squyres wrote: >We've talked about maildb built-in filtering before. Indeed, that's >one of the main strengths of maildb -- that you can/should have >millions of rules that will attach all kinds of categories to messages >(e.g., as opposed to common thinking/usage today where most users file >a message away in a *single* target folder; with maildb you should add >*lots* of categories to each message -- this actually *increases* the >possibility of you seeing important mails, as opposed to only filing a >message away in a single [potentially obscure] folder). > >So we [eventually] need to support server-side filtering somehow. > >Up until now, we've only concentrated on the storage of messages -- we >need to get this thing working before we tackle the complex issues of >built-in server-side filtering. I think that's been a good decision. > >But since this is a major feature/capability of maildb, it would be >good to support it *somehow* -- even in our initial versions. > >The thought occurred to me today: what about procmail? > >Procmail is a slick server-side user filtering agent that is typically >invoked directly by the MTA (e.g., via .forward). > >Obviously, procmail can write to the conventional mbox and mh formats, >but it won't know how to write to the maildb data store. But perhaps >there's a quick-n-dirty way to make procmail work with maild: instead >of having procmail write the actual output message to a mailbox file, >have it simply add a header line telling maildb what to do when the >message eventually gets written to the database. Perhaps, something >like: > >----- >:0 fc >* ^TO_...@li... >| formail -A "X-Maildb-Category: maildb/devel" > >:0 fc >* ^FROM_.*@squyres.com >| formail -A "X-Maildb-Category: received/squyres/family" >----- > >Then when the message finally gets written to the db, maildb will see >any X-Maildb-Category line(s) and attach the appropriate category >name(s) to the message in the database. This is actually more >efficient, because procmail won't write out the message N times -- >it'll only add N header lines and then write out the message *once* to >the backing store. > >To make it work, there will need to be a final, all-encompassing >procmail rule that actually writes the resulting message (including >any added X-Maildb-Category header lines) into maildb by invoking some >custom executable: > >----- >:0 f >| /usr/local/bin/maildb.insert >----- > >...or something along those lines. > >Does this sound too hack-ish? Any other thoughts/ideas? > |
|
From: <li...@av...> - 2003-07-09 16:54:09
|
> At least for 1.0...? yes indeed. :-) |
|
From: Jeff S. <jsq...@os...> - 2003-07-09 16:47:32
|
On Wed, 9 Jul 2003 li...@av... wrote:
> > Does this sound too hack-ish? Any other thoughts/ideas?
>
> Hmmm...yes, but I can't say that I have any other ideas, and this sounds
> pretty workable. :-)
At least for 1.0...?
;-)
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: <li...@av...> - 2003-07-09 16:23:26
|
> Does this sound too hack-ish? Any other thoughts/ideas? Hmmm...yes, but I can't say that I have any other ideas, and this sounds pretty workable. :-) - Liza |
|
From: <li...@av...> - 2003-07-09 16:08:14
|
> My question is: how do we authenticate to the database? > ... > > Do we just put a 0400 file somewhere on the local filesystem that only > root and the mail.local user (probably "mail" or "daemon" or ...?) can > read that contains th DB username and password? The only other way that I > can think of would be to compile the DB username/pw in the mail.local > executable, but that might make it vulnerable to "strings mail.local", or > something along those lines. Is there a standard way to do this kind of > thing? We're not trying to protect from root in this case -- we're only > trying to protect from other users (right?) -- so I'm thinking that a 0400 > file might not be totally evil (one way to think of it: it's no less > secure than 0600 /var/spool/mail/* mbox files). That's one way to do it. Another way that we used at A Former Company of Mine (Collective Technologies, Austin TX) was to do a simple encryption of the database username/password into a "keyring", and our perl subroutine that handled the database connection extracted/used that information. It was more obfuscated than it was secure ... but we figured every little bit helped. :-) - Liza |
|
From: <li...@av...> - 2003-07-09 16:03:14
|
> Can our current DB schema handle this? > > I *think* it can -- it seems like we're using message ID + the unique > integer (msg_ids.m_id). Does that make sense? We're keying everything by our own internal identifier - msg_ids.m_id, which then gets referenced in the other tables (msg_attach, etc.). The message ID itself is in msg_ids.m_msg_id, so presumably if we get a second copy of the same message (forwarded, whatever) we could drop it or keep it, depending on what you want to do. - Liza |
|
From: Jeff S. <jsq...@os...> - 2003-07-09 03:18:40
|
We've talked about maildb built-in filtering before. Indeed, that's
one of the main strengths of maildb -- that you can/should have
millions of rules that will attach all kinds of categories to messages
(e.g., as opposed to common thinking/usage today where most users file
a message away in a *single* target folder; with maildb you should add
*lots* of categories to each message -- this actually *increases* the
possibility of you seeing important mails, as opposed to only filing a
message away in a single [potentially obscure] folder).
So we [eventually] need to support server-side filtering somehow.
Up until now, we've only concentrated on the storage of messages -- we
need to get this thing working before we tackle the complex issues of
built-in server-side filtering. I think that's been a good decision.
But since this is a major feature/capability of maildb, it would be
good to support it *somehow* -- even in our initial versions.
The thought occurred to me today: what about procmail?
Procmail is a slick server-side user filtering agent that is typically
invoked directly by the MTA (e.g., via .forward).
Obviously, procmail can write to the conventional mbox and mh formats,
but it won't know how to write to the maildb data store. But perhaps
there's a quick-n-dirty way to make procmail work with maild: instead
of having procmail write the actual output message to a mailbox file,
have it simply add a header line telling maildb what to do when the
message eventually gets written to the database. Perhaps, something
like:
-----
:0 fc
* ^TO_...@li...
| formail -A "X-Maildb-Category: maildb/devel"
:0 fc
* ^FROM_.*@squyres.com
| formail -A "X-Maildb-Category: received/squyres/family"
-----
Then when the message finally gets written to the db, maildb will see
any X-Maildb-Category line(s) and attach the appropriate category
name(s) to the message in the database. This is actually more
efficient, because procmail won't write out the message N times --
it'll only add N header lines and then write out the message *once* to
the backing store.
To make it work, there will need to be a final, all-encompassing
procmail rule that actually writes the resulting message (including
any added X-Maildb-Category header lines) into maildb by invoking some
custom executable:
-----
:0 f
| /usr/local/bin/maildb.insert
-----
...or something along those lines.
Does this sound too hack-ish? Any other thoughts/ideas?
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: Jeff S. <jsq...@os...> - 2003-07-09 02:59:54
|
For you DBAs out there... is there a common way to do this? (I think this
is an easy question, but I managed to confuse myself earlier today and
want to run in by you guys to ensure that I'm not crazy)
We've talked about user authentication before, and we decided to leave it
as the responsibility of the IMAP daemon. This allows the possibility of
a bunch of different schemes, like passwd/shadow, pam, LDAP, etc. i.e.:
it's not our problem. I think this is the Right Thing. I'm talking about
different authentication -- authentication to the database.
My question is: how do we authenticate to the database?
There's [at least] two different places where a process will need to be
executed on the server to insert a message into maildb: mail.local and a
server-side user filtering agent (e.g., procmail). Let's look at
mail.local, although they both essentially come down to the same issue.
At some point, the MTA is going to invoke mail.local on the server to
actually deliver the message to the backing store (remember that UW IMAP
provides a mail.local replacement that will be able to write to the
maildb). This mail.local process has to be able to connect to the
[MySQL|Postgres|whatever] database, authenticate, and then do its thing.
How do we do that?
Do we just put a 0400 file somewhere on the local filesystem that only
root and the mail.local user (probably "mail" or "daemon" or ...?) can
read that contains th DB username and password? The only other way that I
can think of would be to compile the DB username/pw in the mail.local
executable, but that might make it vulnerable to "strings mail.local", or
something along those lines. Is there a standard way to do this kind of
thing? We're not trying to protect from root in this case -- we're only
trying to protect from other users (right?) -- so I'm thinking that a 0400
file might not be totally evil (one way to think of it: it's no less
secure than 0600 /var/spool/mail/* mbox files).
For the procmail issue, whatever process is launched (perhaps a variant of
mail.local) will likely be launched under the UID of the recipient user.
So will this executable need to be setuid to the mail user? Or is that
asking for trouble?
Thoughts?
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: Jeff S. <jsq...@os...> - 2003-07-09 02:34:50
|
Here's a new issue that I thought of while I was driving home from
Bloomington today...
For my personal mail use, I recently switched over from client-side
filtering to server-side filtering (procmail). In doing so, I learned
that the same message (i.e., a message with the same Message-Id) can
legitimately arrive at a single mailbox multiple times -- and possibly
even with different headers.
It's as simple as sending a message to two recipients: bo...@wo... and
bo...@ho.... Bob has his home address forwarded to work. So Bob
actually gets two copies of the same message in his work mailbox, but
aside from some similarities (including an identical message ID, To, From,
Subject, Dates, etc.), the headers of the two messages may be very
different. For example, the routes may be entirely different. The
Subjects may be similar, but they may be different.
Consider an even worse case -- someone sends a virus to bo...@wo... and
som...@ex.... Bob's a member of somelist, so he gets two copies.
But the mailing list adds its own header lines and footer to the body.
So the message ID is the same, but for all intensive purposes, everything
else is different.
Can our current DB schema handle this?
I *think* it can -- it seems like we're using message ID + the unique
integer (msg_ids.m_id). Does that make sense?
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: <li...@av...> - 2003-05-22 18:36:34
|
Oops. I don't imagine m_parent_id can be -1 if I define the field as integer unsigned. Make that integer. :-) li...@av... writes: > Ok, so here are today's changes made on queeg. > > -- msg_ids adds some "header" cols, and m_parent_id (which would map to > another m_id, or be -1 if message is standalone). > > create table msg_ids ( > m_id integer unsigned not null auto_increment, > m_msg_id varchar(255) not null, > m_to text, > m_cc text, > m_bcc text, > m_subject text, > m_date datetime, > m_from text, > m_in_reply_to text, > m_sender text, > m_parent_id integer unsigned, > m_vw_incl text, > m_vw_excl text, > primary key (m_id), > index m_msg_id_idx (m_msg_id) > ); |
|
From: <li...@av...> - 2003-05-22 18:30:56
|
Ok, so here are today's changes made on queeg. -- msg_ids adds some "header" cols, and m_parent_id (which would map to another m_id, or be -1 if message is standalone). create table msg_ids ( m_id integer unsigned not null auto_increment, m_msg_id varchar(255) not null, m_to text, m_cc text, m_bcc text, m_subject text, m_date datetime, m_from text, m_in_reply_to text, m_sender text, m_parent_id integer unsigned, m_vw_incl text, m_vw_excl text, primary key (m_id), index m_msg_id_idx (m_msg_id) ); -- msg_owners loses date, from, subject field - an earlier attempt (that I forgot about completely) to put a few "common" fields somewhere to avoid a join. I wish I remembered why I put those in this table and not in msg_ids. Oh well. create table msg_owners ( mu_id integer unsigned not null auto_increment, mu_m_id integer unsigned not null references msg_ids(m_id), mu_u_id integer unsigned not null references users(u_id), mu_ca_id integer unsigned not null references cats(ca_id), -- mu_date integer unsigned not null references msg_hdrs(mh_id), -- mu_from integer unsigned not null references msg_hdrs(mh_id), -- mu_subject integer unsigned not null references msg_hdrs(mh_id), mu_flags integer, primary key (mu_id), index mu_m_id_idx (mu_m_id), index mu_u_id_idx (mu_u_id), index mu_ca_id_idx (mu_ca_id) ); -- msg_attach loses ma_in_fs, ma_in_msg fields. These changes are reflected in cr_maildb.sql and also documented in the design.html document in the mysql part of the cvs tree. I need to review the indexing though. Well, lots to be reviewed and fine-tuned as we go along (including text vs varchar as James notes, etc.). - Liza |
|
From: James S. <jl...@do...> - 2003-05-22 17:55:03
|
li...@av... wrote: > je...@sq... wrote: >> Heck, let's err on the side of lots of options. The idea here is that we >> can search and sort in a million different ways: >> [...] > > > Errr...that's a lot. Sure you don't want to keep just the "well-known" > ones in msg_ids and shove the rest off into msg_hdrs? Or do you want to > get rid of msg_hdrs and put everything into msg_ids? Actually I guess > we'd need a msg_hdrs in any case to catch anything that we don't account > for in msg_ids. Just wondering how far you want to go here. I think the impact of indexing is going to be our guide here, I'm just not sure if there's a difference in performance. I don't know if it's faster to put n single-column indexes on one row, or one single-column index on n rows, when inserting message headers. I think RFC 2822 is the latest one governing mail messages...when I get a chance I'll see if it calls out required headers. If it does, I say we stick to those in msg_ids for a first pass. If not, I say we just do the "common" ones and put the optional/variable ones (like X- headers) in msg_hdrs. > >> Here's an issue, though: what happens if the value of any of these header >> lines is longer than the length of the "text" field? > > > Well, text is 64k, mediumtext is 16m, and largetext is 4g. (I misspoke > earlier when I said text was 16m.) So...I'd make them mediumtext, I > suspect, since I think 64k may be too small, but 16m has got to be > overkill. I mean, I can't even get 1m message bodies sent half the time, > much less anything with a header that big. I'd say even text is overkill...but I don't know if there's a spec limit on header length. A big VARCHAR might be more efficient than text. If we can't find a reference we may need to do some analysis here. JLS |
|
From: Jeff S. <jsq...@os...> - 2003-05-22 17:16:43
|
On Thu, 22 May 2003 li...@av... wrote:
> Sure, we can cram it somewhere. :-) I could just make a table called
> 'msg_orig' that consists of the msg_id and a largetext field to hold the
> whole thing. Ok?
Sounds good to me.
> > Heck, let's err on the side of lots of options. The idea here is that we
> > can search and sort in a million different ways:
> >
> > - To
> > [snipped]
>
> Errr...that's a lot. Sure you don't want to keep just the "well-known"
> ones in msg_ids and shove the rest off into msg_hdrs? Or do you want to
> get rid of msg_hdrs and put everything into msg_ids? Actually I guess
> we'd need a msg_hdrs in any case to catch anything that we don't account
> for in msg_ids. Just wondering how far you want to go here.
True.
Ok, let's just do a few for now, and if performance really sucks, we can
add more later. How about:
- To
- CC
- BCC (for outgoing messages)
- Subject
- Date
- From
- In-Reply-To
- Sender
> > Here's an issue, though: what happens if the value of any of these header
> > lines is longer than the length of the "text" field?
>
> Well, text is 64k, mediumtext is 16m, and largetext is 4g. (I misspoke
> earlier when I said text was 16m.) So...I'd make them mediumtext, I
> suspect, since I think 64k may be too small, but 16m has got to be
> overkill. I mean, I can't even get 1m message bodies sent half the
> time, much less anything with a header that big.
I think text (64k) is fine for a single line in a header. If you've got a
header line that's longer than 64k, you've got other issues! ;-)
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|
|
From: <li...@av...> - 2003-05-22 16:52:46
|
> For safety's sake (and probably only while we're developing/debugging), > should we stash the entire (unmodified text) header in a table somewhere? Sure, we can cram it somewhere. :-) I could just make a table called 'msg_orig' that consists of the msg_id and a largetext field to hold the whole thing. Ok? > Heck, let's err on the side of lots of options. The idea here is that we > can search and sort in a million different ways: > > - To > - CC > - BCC (for outgoing messages) > - Subject > - Date > - In-Reply-To > - Precedence > - Reply-To > - Sender > - Message-ID (I think we have that already, right? Just mention it to be > complete...) > - Return-Path > - List-Id > - X-Sender > - X-Mailer > - User-agent > - Thread-topic > - Thread-index > - References > - ...? > Errr...that's a lot. Sure you don't want to keep just the "well-known" ones in msg_ids and shove the rest off into msg_hdrs? Or do you want to get rid of msg_hdrs and put everything into msg_ids? Actually I guess we'd need a msg_hdrs in any case to catch anything that we don't account for in msg_ids. Just wondering how far you want to go here. > Here's an issue, though: what happens if the value of any of these header > lines is longer than the length of the "text" field? Well, text is 64k, mediumtext is 16m, and largetext is 4g. (I misspoke earlier when I said text was 16m.) So...I'd make them mediumtext, I suspect, since I think 64k may be too small, but 16m has got to be overkill. I mean, I can't even get 1m message bodies sent half the time, much less anything with a header that big. - Liza |
|
From: Jeff S. <jsq...@os...> - 2003-05-22 16:00:05
|
On Thu, 22 May 2003 li...@av... wrote:
> Ok, I agree as well about the messages being in the database in their
> entirety, and that the ma_in_fs field can go away. Jeff, I also dimly
> remember that we decided to treat all message bodies as attachments.
> To indicate that a given message is not a standalone, yes, how about we
> add a "m_parent" field to msg_ids that would contain the m_id of the
> parent message, or would be -1 if the message is a standalone.
Excellent. :-)
> I waffled a lot on this and went the more generic route. Also I
> originally assumed that with a line in msg_headers per header, there
> would be a separate entry (row) for each recipient on the to, cc, bcc
> lines. I have no problem putting the "required" headers into msg_ids --
> [snipped]
I agree here that for ease of searching, we probably want to put "well
known" header lines in specific fields.
> but I believe then we would be simply putting the entire comma-separated
> list as the value of m_to, m_cc, etc. Is this ok with everyone?
I think that's ok. Otherwise we'd have to make yet another table indexed
by message ID, right?
Let's try this approach and see how it works (i.e., that the field
contains the value of the "To:" line, etc.).
For safety's sake (and probably only while we're developing/debugging),
should we stash the entire (unmodified text) header in a table somewhere?
i.e., in case we decide to re-do the schema, we have all the original
header that we can re-build all the tables from? I don't know if this is
a huge deal, and/or if it's helpful, but it might not be a bad idea --
could help with debugging (e.g., compare what ended up in the tables to
what the unmodified header is). Just an idea. :-)
> So...assuming I modify msg_ids ... what will we consider the required
> headers? to, cc, subject, date, ... ?
Heck, let's err on the side of lots of options. The idea here is that we
can search and sort in a million different ways:
- To
- CC
- BCC (for outgoing messages)
- Subject
- Date
- In-Reply-To
- Precedence
- Reply-To
- Sender
- Message-ID (I think we have that already, right? Just mention it to be
complete...)
- Return-Path
- List-Id
- X-Sender
- X-Mailer
- User-agent
- Thread-topic
- Thread-index
- References
- ...?
(I know the X-* ones are not standard, but enough mailers use the ones
that I mentioned that it could be worthwhile -- didn't you always want to
be able to filter by who sends using Outlook Express? ;-)
Here's an issue, though: what happens if the value of any of these header
lines is longer than the length of the "text" field?
--
{+} Jeff Squyres
{+} jsq...@os...
{+} Research Associate, Open Systems Lab, Indiana University
{+} http://www.osl.iu.edu/
|