RE: [Algorithms] Network & byte order.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

While it doesn't really matter which order you choose so long as both sides
use it, network standard byte order is bigendian.

There are a number of C macro functions that convert short and long values
to this format. (htons, htonl, ntohs, ntohl from memory)
Floats, as mentioned by Pierre can be a bit more of a headache. I generally
tend to avoid them if at all possible. If I require decimal values they've
usually been of fixed precision so I've simply sent them as integer using a
common divisor on both sides.

> -----Original Message-----
> From: gda...@li...
> [mailto:gda...@li...]On Behalf Of Kent
> Quirk
> Sent: Friday, August 18, 2000 11:50 PM
> To: gda...@li...
> Subject: Re: [Algorithms] Network & byte order.
>
>
>
> Lionel Fumery wrote:
> > We are designing our network libraries, for our next games. We
> would like to
> > produce cross-platfom games, with misc processors targets.
> >
> > In the case of multi-platform network game, we wonder if we
> have to consider
> > the byte-ordering of the platforms... Intel is little-endian,
> whereas Apple
> > (Motorola) is beg-endian.
> > Anybody can tell us what platforms are little-endian, or big-endian?
> >
> > If all our target-platforms are little-endian, we could avoid this
> > byte-swaping and then keep some CPU time for something else...
>
> Compared to the time spent on the network, the amount of time you'll
> spend byte-swapping is so microscopic as to be invisible.
>
> General rule of thumb:
> Don't expect to write a multibyte value as a stream of binary bytes on
> one platform and expect to read it in on another and have it work.
> Define your formats in a way that's independent of byte order. Either
> use a text value (XML, for example) or if you need to keep the data at
> minimal size (and in modem-based networking you usually do) then define
> your data formats at the byte level.
>
> Don't say:
>
> "The header consists of a 4-byte unsigned int packet ID."
>
> say:
>
> "The first 4 bytes of the header are a packet ID, sent as a four byte
> integer, least-significant byte first."
>
> Then it's unambiguous what you're doing.
>
> With that said, I just found some comments in the header of one of the
> files on our file format (called CHUFF) in MindRover. They were written
> by Nat Goodspeed, who works here:
>
> ------------------------------------
> WARNING!  For efficiency reasons, the read/write implementations for
> types
> such as 'bin4' are implemented by directly examining the storage used
> for the
> native-type variable.  This is fast, but is inherently
> platform-sensitive.
> CHUFF data types are little-endian by definition (so that we can have
> some
> hope of exchanging files between different platforms).  Therefore, when
> you
> port this implementation to a big-endian machine, make SURE you define
> 'HIBYTE1ST' as one of the compiler's command-line switches!
>
> Our byte-swapping big-endian implementations assume that it's still
> cheaper to
> make a single I/O method call for the full size of the value, exchanging
> bytes
> using temporary variables in memory, than it is to break out separate
> I/O
> operations for each byte.  That may not be entirely true.  But one
> advantage
> of this scheme is that on input, we can still test for EOF on a single
> call,
> rather than having to test separately for each byte.
>
> There are two different philosophical approaches to implementing a
> cross-platform binary format, that is, one such as ours, in which (for
> instance) bin4 must be read and written as little-endian, regardless of
> the
> byte order in which the platform on which we're running normally stores
> its
> binary integers.
>
> Convert on Use
> --------------
> One approach is to implement a family of classes that literally define
> the way
> the storage will be used.  For instance, bin4 could be defined as a
> class
> which always contains a little-endian binary integer value.  We would
> then
> define conversions to and from ordinary binary integers, arithmetic and
> logical operations, etc., so that any operation on a bin4 object results
> in a
> little-endian value in memory.
>
> The advantage of this strategy is that such fields can apparently be
> composed
> into structs that describe the actual byte stream.  In theory you can
> then
> instantiate such a struct, populate some or all of its fields and just
> write
> it out -- or, conversely, read the struct in its entirety (or even just
> map
> the struct onto part of a previously-read buffer) and then just
> reference some
> of its fields.
>
> In practice, this is complicated considerably by the need to worry about
> platform-dependent struct alignment requirements.  But you can still
> build it,
> even though you sometimes end up having to define the actual data as an
> array
> of bytes to bypass automatic compiler alignment.
>
> With this approach, you need to spend considerable development time on
> each
> individual field type; it must support the full suite of arithmetic and
> logical operations you intend to use.  Those operations are, of course,
> somewhat more expensive than operations on the corresponding native
> type.  But
> this can still be a win if:
>
> (a) there are very many more cross-platform structs than there are field
> types.  The whole rationale for this approach is that you do NOT need to
> implement read/write methods for each different struct; composing such
> fields
> into structs should then permit the structs to be transparently used on
> a byte
> stream.
>
> (b) there are very many more fields in a typical cross-platform struct
> than
> you actually use.  (In such a case, you might consider redesigning your
> protocol, since it appears to be wasteful of space!)  But if you have to
> live
> with a protocol definition like that, the tradeoff might work in your
> favor:
> with these fields, you pay for the conversion each time you use them,
> but you
> don't have to pay for converting fields that you don't use at all.
>
> (c) for some reason, you need random access to parts of a buffer.  For
> instance, you are filling a transmission buffer with such structs, but
> the
> protocol requires a header struct that describes how many other structs
> follow
> it, and it would be expensive or impossible to determine that number in
> advance.  If you have a pointer to the header struct in the buffer, you
> can
> simply patch the count field on the fly.
>
> Convert on I/O
> --------------
> The other approach is to define fields that store values much like
> native C++
> types, so that it's reasonably easy and cheap to perform arithmetic and
> logical
> operations on them, but each field knows how to serialize and
> deserialize
> itself to a data stream.
>
> Since the conversion of each field to and from a byte stream is
> explicit, you
> have explicit control over such things as alignment, rather than
> worrying
> about what the compiler might be doing behind your back.  This approach
> also
> allows you to use C++ classes with virtual functions, which you can't do
> with
> a convert-on-use mechanism since the VFT pointer is part of the storage
> occupied by each class object.
>
> The drawback is that for each struct or class you intend to write to, or
> receive from, a cross-platform data stream, you must implement specific
> read/write methods that enumerate all the (persistent) fields in that
> struct
> or class.  These methods must be maintained every time you change the
> set of
> fields in the struct/class.
>
> This can be a win if:
>
> (a) there are relatively few predefined structs in the protocol.
> Implementing
> a small set of read/write methods can be easier than implementing all
> the
> support methods for each convert-on-use field type.
>
> (b) you access the fields in your structs much more often than you
> de/serialize
> them from/to the data stream.  You only pay for conversion at the time
> you
> actually read or write the fields, rather than every time you touch one
> of
> them.
>
> (c) your protocol allows you to write header information and proceed,
> rather
> than needing to go back and revisit the header to fix up one or more of
> its
> fields.  That is, either protocol headers don't need to make assertions
> about
> the data that follows, or it's relatively easy to derive that
> forward-looking
> information.
>
> I was going to say something about dynamic composition -- the case when
> you
> want to read or write individual fields in an order determined at
> runtime
> rather than at compile time -- but actually, I think that would probably
> work
> out equally well either way.
>
> In any case, we use the convert-on-I/O approach.  bin4 and friends store
> data
> very much like C long int, etc., but they know how to read and write
> themselves from/to a data stream.
>
> However, for internal purposes, we find it useful to borrow a
> convert-on-use
> notion:  within this file, we implement a LittleEndian type that always
> maintains data in little-endian form.
>
> --------------------------
>
> Hope this helps.
>
> 	Kent
>
> --
> -----------------------------------------------------------------------
> Kent Quirk                   | CogniToy: Intelligent toys...
> Game Architect               |           for intelligent minds.
> ken...@co...      | http://www.cognitoy.com/
> _____________________________|_________________________________________
>
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list
>