Re: [Openpacket-devel] Organize traces on file system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 8/3/06, Jacob Ham <ha...@gm...> wrote:
>
> Hi All,
>
> On 8/3/06, Tim Furlong <fu...@cc...> wrote:
> > Hi Jake,
> >
> > I was wondering if you could clarify what you mean about the file system
> > handling the compression.  Did you have something specific in mind?  I'm
> > thinking that unless the fs is remote, any compression that it does
> would
> > incur about the same number of CPU cycles as doing it inline.  It might
> work
> > better on a multiprocessor system, but it wouldn't be too hard to have
> > openpacket.org handle compression in a seperate process.
> >
>
> Indeed, I had ZFS in mind for a file system.  It is extremely
> expandable, fast, provides data integrity, and has low CPU usage
> compression.  You can read more about it here if interested,
> http://www.opensolaris.org/os/community/zfs/  .  I think it really
> doesn't matter now, but if we grow to hundreds of gigs of data, it
> will definitely be something to think about.

It looks good, I just have two questions:

First, has it already been ported to FreeBSD, or would we have to run
Solaris 10?  My impression was that Richard is fairly keen on using FreeBSD
as a platform, and the official ZFS FAQ says there's no official plans to
port it to anything other than Solaris 10.

Second, is anyone here knowledgeable about the CDDL (the license for
opensolaris)?  I took a quick look, but I'm not familiar with it, and I'd
feel more comfortable if I could have someone (preferably a lawyer and
preferably not one retained by Sun) tell us what we should be looking out
for.  In fact, that's probably also a good idea regardless of what we use;
I'm not familiar enough even with the GPL and LGPL to know what the gotchas
are as far as designing a publicly accessible system like this. For
instance, the CDDL FAQ suggests that there may be issues with statically
linking source files that are under different licenses.

Another option would be to gather meta data once uploaded, gzip it
> once, and always serve the file compressed.  The only problem with
> this is if we ever decide to reference captures in line (on the site
> instead of having to download and open the capture in Wireshark).  Say
> if someone wants to describe a capture in detail, he could reference
> lines 10-29, describe them, then move to 30-45 (assuming we had a
> system in place like this).

I think there are ways around that; for instance the reviewer could just
upload WireShark screenshots, or the analysis submission could allow the
user to specify packet numbers, then fill in the blanks by decompressing the
file, extracting the info for the desired packets into the DB, and then
recompressing the file.  It'd be best to do that offline though, which would
just mean that the interface to present analyses would have to be able to
recognize a not-yet-complete operation and display <packet info pending> or
something.

I don't know what kind of systems we have here for use, revenue model
> (advertising, donations etc?), or hosting issues.  I assume Richard is
> working on this.  If we need to save bandwidth and space we could do
> so in the design.
>
> > It would mainly have to store the torrent files, so
> > volume of data wouldn't be as much of an issue as number of files.
> > Honestly, we probably wouldn't have to store them as files, we could
> > probably just store the contents of the torrent file in a DB and only
> dump
> > it to file long enough to send it to a user, and maybe have a cache of
> > frequently-requested torrents.
>
> If we cache the most requested ones, it will be faster but then we are
> back where we are now.... How would we store the cached files?!  What
> if there are 1000s of popular files we cache?

I haven't looked, but I suspect that it's possible, with PHP or Ruby or
directly through an apache module or such, to bypass the filesystem entirely
and just have the web interface fetch the data straight from the DB.  That
would still involve the fs, of course, since the DB would be housed there,
but it would be optimized by the DB software.  It's easy enough to do in
perl at least, you just output the appropriate HTTP header and dump the data
down the pipe, regardless of where the data comes from.  I don't expect that
it would be much harder in the other frameworks.

If you're worried about the sheer number of files on the filesystem, with
ext2/ext3 at least, you can set the number of inodes created when you set up
the filesystem; if you're going to have lots of small files, you create more
than the default number of inodes (4% of the filesystem or something like
that, I think).

If you're more worried about access times (finding a file gets bloody slow
with thousands of files in one directory), a standard trick is to radix sort
into subdirectories.  In this case, we could do that using the hash; i.e.
use the first two or three hexadecimal characters of the hash as the name of
a subdirectory in the base dir, the next two or three as the next
subdirectory, etc.  So if you had files with the following five hashes (I'll
use the full hash as the filename for the example):

25A1078996BE4F57DD89ABD8692538A0FB64428D
25C69487E704607EC72D19D9E6E0552A47004F64
E4EABBA07718253835B74ADB8B276B2A45EC3F93
E4EBED96FB5CFF73922D15AA533032EB35A673E7
FCC4DF6660CB0E7C2ABFE439A7C423690B4CD7A6

you could create a tree like:

./25/A1/25A1078996BE4F57DD89ABD8692538A0FB64428D
./25/C6/25C69487E704607EC72D19D9E6E0552A47004F64
./E4/EA/E4EABBA07718253835B74ADB8B276B2A45EC3F93
./E4/EB/E4EBED96FB5CFF73922D15AA533032EB35A673E7
./FC/C4/FCC4DF6660CB0E7C2ABFE439A7C423690B4CD7A6

I suggest two or three, because ext2 at least can't handle more than 32767
subdirectories including ./ and ../, so four would potentially cause
problems.

If we can go with ZFS, though, such kludges might not be necessary (*knock
wood*).  We'd probably have to do some testing to see for sure, though.

So perhaps we should try and identify all of the issues we're worried about
in the context of storage, and possible solutions?

1) Sheer number of bytes
1a) background built-in compression by ZFS
1b) automatic compression by openpacket.org on receipt of a trace (after
summarization)
1c) "offshoring" large traces via bittorrent
1d) background compression on an ext3 filesystem (or whatever FS FreeBSD
prefers)

2) number of files on the filesystem
2a) ZFS (need to confirm that it handles large numbers of files gracefully)
2b) FS tuning
2c) some sort of automated archival of less-used files
2d) files stored in DB instead of on fs

3) number of files in a given directory
3a) ZFS (need to confirm that directory seek time scales well for large
directories)
3b) radix sorting
3c) pure DB handling of files served

Have I missed anything, either concerns or possible solutions?

I think the major questions are whether ZFS will solve all the issues,
whether it will solve them better than any of the other possible solutions,
and whether it's worth either changing over to OpenSolaris or porting ZFS to
FreeBSD ourselves (I suspect the latter will be a rather large job).

-Tim