#7 DB File is endian dependent

open
nobody
None
5
2002-12-06
2002-12-06
Anonymous
No

Using bmf from some of my servers, I noticed that
I could wreak havoc in the collected statistics by
running bmf -s on one of my machines. After that
mails are constantly misclassified.

Further investigation showed that the values stored
in the db files are byte order dependent. What looks
on the i386 machine (excerpt from bmfconv -e)

version 32

looks on the PPC

version 1778384896

Of course updating that value will completely break
the stats for both plattforms.

I'd suggest to store the values in a defined format,
possibly network byte order, using the appropriate
conversion routines (htonl et al).

Discussion

  • Tom Marshall
    Tom Marshall
    2002-12-10

    Logged In: YES
    user_id=614531

    This is by design. I made a conscious decision when writing
    the libdb
    routines to use host endianness. The reasons are:

    1. I do not know if the libdb file format is portable
    between architectures
    and/or implementations. It may not be safe to transport the
    files between
    machines.

    2. The other file formats do not have endian problems. If
    you need to
    transport data between machines, you may export to text
    format, transport
    the text file, and re-import to libdb format.

    Of course, now that the program has been released, there is
    a problem with
    backward compatibility. How could a new release change
    endianness without
    creating a mess of existing users' databases?

    Can you provide documentation that libdb files are portable
    and a usage
    scenario that is convincing enough to make the change
    worthwhile?

     
  • Logged In: YES
    user_id=237675

    > if the libdb file format is portable between architectures
    > and/or implementations.

    After circumstantial evidence showed it works (after all I
    could access the right data on both i386 and PPC, just the
    data values were wrong), I researched a bit and found

    http://elib.cs.berkeley.edu/admin/BerkeleyDB/api_c/DbInfo/info.html

    which, among other interesting details, mentions:

    -- quote --

    int db_lorder;
    The byte order for integers in the stored database
    metadata. The number should represent the order as an
    integer, for example, big endian order is the number 4,321,
    and little endian order is the number 1,234. If db_lorder is
    0, the host order of the machine where the Berkeley DB
    library was compiled is used.

    The value of db_lorder is ignored except when databases
    are being created. If a database already exists, the byte
    order it uses is determined when the file is read.

    The access methods provide no guarantees about the byte
    ordering of the application data stored in the database, and
    applications are responsible for maintaining any necessary
    ordering.

    -- end quote --

    which indicates that one can indeed read DB-Files on all
    platforms.

    > 2. The other file formats do not have endian problems. If
    > you need to transport data between machines, you may
    > export to text format, transport the text file, and re-import
    > to libdb format.

    This is not an option, as I use an NFS-mounted home
    directory, so I actually use the _same_ database from all
    workstations.

    I agree that that is probably not a very widely used
    configuration, but it's not alltogether nonsensical either.

     
  • Tom Marshall
    Tom Marshall
    2002-12-12

    Logged In: YES
    user_id=614531

    Thanks for the information. I will look into the issue
    further. I think
    it's time to start versioning the file format.

    I'm assuming that you have fixed this issue on your
    machine(s) by
    adding the proper htonl() and ntohl() calls?

    I've always thought that NFS had file locking issues that
    prevented this type of usage from being robust.