DB File is endian dependent
Status: Beta
Brought to you by:
t-m
Using bmf from some of my servers, I noticed that
I could wreak havoc in the collected statistics by
running bmf -s on one of my machines. After that
mails are constantly misclassified.
Further investigation showed that the values stored
in the db files are byte order dependent. What looks
on the i386 machine (excerpt from bmfconv -e)
version 32
looks on the PPC
version 1778384896
Of course updating that value will completely break
the stats for both plattforms.
I'd suggest to store the values in a defined format,
possibly network byte order, using the appropriate
conversion routines (htonl et al).
Logged In: YES
user_id=614531
This is by design. I made a conscious decision when writing
the libdb
routines to use host endianness. The reasons are:
1. I do not know if the libdb file format is portable
between architectures
and/or implementations. It may not be safe to transport the
files between
machines.
2. The other file formats do not have endian problems. If
you need to
transport data between machines, you may export to text
format, transport
the text file, and re-import to libdb format.
Of course, now that the program has been released, there is
a problem with
backward compatibility. How could a new release change
endianness without
creating a mess of existing users' databases?
Can you provide documentation that libdb files are portable
and a usage
scenario that is convincing enough to make the change
worthwhile?
Logged In: YES
user_id=237675
> if the libdb file format is portable between architectures
> and/or implementations.
After circumstantial evidence showed it works (after all I
could access the right data on both i386 and PPC, just the
data values were wrong), I researched a bit and found
http://elib.cs.berkeley.edu/admin/BerkeleyDB/api_c/DbInfo/info.html
which, among other interesting details, mentions:
-- quote --
int db_lorder;
The byte order for integers in the stored database
metadata. The number should represent the order as an
integer, for example, big endian order is the number 4,321,
and little endian order is the number 1,234. If db_lorder is
0, the host order of the machine where the Berkeley DB
library was compiled is used.
The value of db_lorder is ignored except when databases
are being created. If a database already exists, the byte
order it uses is determined when the file is read.
The access methods provide no guarantees about the byte
ordering of the application data stored in the database, and
applications are responsible for maintaining any necessary
ordering.
-- end quote --
which indicates that one can indeed read DB-Files on all
platforms.
> 2. The other file formats do not have endian problems. If
> you need to transport data between machines, you may
> export to text format, transport the text file, and re-import
> to libdb format.
This is not an option, as I use an NFS-mounted home
directory, so I actually use the _same_ database from all
workstations.
I agree that that is probably not a very widely used
configuration, but it's not alltogether nonsensical either.
Logged In: YES
user_id=614531
Thanks for the information. I will look into the issue
further. I think
it's time to start versioning the file format.
I'm assuming that you have fixed this issue on your
machine(s) by
adding the proper htonl() and ntohl() calls?
I've always thought that NFS had file locking issues that
prevented this type of usage from being robust.