Thread: [Algorithms] Network & byte order.
Brought to you by:
vexxed72
From: Lionel F. <li...@mi...> - 2000-08-18 10:29:36
|
Hello, (I hope my question is not too much OT) We are designing our network libraries, for our next games. We would like to produce cross-platfom games, with misc processors targets. In the case of multi-platform network game, we wonder if we have to consider the byte-ordering of the platforms... Intel is little-endian, whereas Apple (Motorola) is beg-endian. Anybody can tell us what platforms are little-endian, or big-endian? If all our target-platforms are little-endian, we could avoid this byte-swaping and then keep some CPU time for something else... Thank you for any advice ! Lionel. |
From: Kent Q. <ken...@co...> - 2000-08-18 13:46:31
|
Lionel Fumery wrote: > We are designing our network libraries, for our next games. We would like to > produce cross-platfom games, with misc processors targets. > > In the case of multi-platform network game, we wonder if we have to consider > the byte-ordering of the platforms... Intel is little-endian, whereas Apple > (Motorola) is beg-endian. > Anybody can tell us what platforms are little-endian, or big-endian? > > If all our target-platforms are little-endian, we could avoid this > byte-swaping and then keep some CPU time for something else... Compared to the time spent on the network, the amount of time you'll spend byte-swapping is so microscopic as to be invisible. General rule of thumb: Don't expect to write a multibyte value as a stream of binary bytes on one platform and expect to read it in on another and have it work. Define your formats in a way that's independent of byte order. Either use a text value (XML, for example) or if you need to keep the data at minimal size (and in modem-based networking you usually do) then define your data formats at the byte level. Don't say: "The header consists of a 4-byte unsigned int packet ID." say: "The first 4 bytes of the header are a packet ID, sent as a four byte integer, least-significant byte first." Then it's unambiguous what you're doing. With that said, I just found some comments in the header of one of the files on our file format (called CHUFF) in MindRover. They were written by Nat Goodspeed, who works here: ------------------------------------ WARNING! For efficiency reasons, the read/write implementations for types such as 'bin4' are implemented by directly examining the storage used for the native-type variable. This is fast, but is inherently platform-sensitive. CHUFF data types are little-endian by definition (so that we can have some hope of exchanging files between different platforms). Therefore, when you port this implementation to a big-endian machine, make SURE you define 'HIBYTE1ST' as one of the compiler's command-line switches! Our byte-swapping big-endian implementations assume that it's still cheaper to make a single I/O method call for the full size of the value, exchanging bytes using temporary variables in memory, than it is to break out separate I/O operations for each byte. That may not be entirely true. But one advantage of this scheme is that on input, we can still test for EOF on a single call, rather than having to test separately for each byte. There are two different philosophical approaches to implementing a cross-platform binary format, that is, one such as ours, in which (for instance) bin4 must be read and written as little-endian, regardless of the byte order in which the platform on which we're running normally stores its binary integers. Convert on Use -------------- One approach is to implement a family of classes that literally define the way the storage will be used. For instance, bin4 could be defined as a class which always contains a little-endian binary integer value. We would then define conversions to and from ordinary binary integers, arithmetic and logical operations, etc., so that any operation on a bin4 object results in a little-endian value in memory. The advantage of this strategy is that such fields can apparently be composed into structs that describe the actual byte stream. In theory you can then instantiate such a struct, populate some or all of its fields and just write it out -- or, conversely, read the struct in its entirety (or even just map the struct onto part of a previously-read buffer) and then just reference some of its fields. In practice, this is complicated considerably by the need to worry about platform-dependent struct alignment requirements. But you can still build it, even though you sometimes end up having to define the actual data as an array of bytes to bypass automatic compiler alignment. With this approach, you need to spend considerable development time on each individual field type; it must support the full suite of arithmetic and logical operations you intend to use. Those operations are, of course, somewhat more expensive than operations on the corresponding native type. But this can still be a win if: (a) there are very many more cross-platform structs than there are field types. The whole rationale for this approach is that you do NOT need to implement read/write methods for each different struct; composing such fields into structs should then permit the structs to be transparently used on a byte stream. (b) there are very many more fields in a typical cross-platform struct than you actually use. (In such a case, you might consider redesigning your protocol, since it appears to be wasteful of space!) But if you have to live with a protocol definition like that, the tradeoff might work in your favor: with these fields, you pay for the conversion each time you use them, but you don't have to pay for converting fields that you don't use at all. (c) for some reason, you need random access to parts of a buffer. For instance, you are filling a transmission buffer with such structs, but the protocol requires a header struct that describes how many other structs follow it, and it would be expensive or impossible to determine that number in advance. If you have a pointer to the header struct in the buffer, you can simply patch the count field on the fly. Convert on I/O -------------- The other approach is to define fields that store values much like native C++ types, so that it's reasonably easy and cheap to perform arithmetic and logical operations on them, but each field knows how to serialize and deserialize itself to a data stream. Since the conversion of each field to and from a byte stream is explicit, you have explicit control over such things as alignment, rather than worrying about what the compiler might be doing behind your back. This approach also allows you to use C++ classes with virtual functions, which you can't do with a convert-on-use mechanism since the VFT pointer is part of the storage occupied by each class object. The drawback is that for each struct or class you intend to write to, or receive from, a cross-platform data stream, you must implement specific read/write methods that enumerate all the (persistent) fields in that struct or class. These methods must be maintained every time you change the set of fields in the struct/class. This can be a win if: (a) there are relatively few predefined structs in the protocol. Implementing a small set of read/write methods can be easier than implementing all the support methods for each convert-on-use field type. (b) you access the fields in your structs much more often than you de/serialize them from/to the data stream. You only pay for conversion at the time you actually read or write the fields, rather than every time you touch one of them. (c) your protocol allows you to write header information and proceed, rather than needing to go back and revisit the header to fix up one or more of its fields. That is, either protocol headers don't need to make assertions about the data that follows, or it's relatively easy to derive that forward-looking information. I was going to say something about dynamic composition -- the case when you want to read or write individual fields in an order determined at runtime rather than at compile time -- but actually, I think that would probably work out equally well either way. In any case, we use the convert-on-I/O approach. bin4 and friends store data very much like C long int, etc., but they know how to read and write themselves from/to a data stream. However, for internal purposes, we find it useful to borrow a convert-on-use notion: within this file, we implement a LittleEndian type that always maintains data in little-endian form. -------------------------- Hope this helps. Kent -- ----------------------------------------------------------------------- Kent Quirk | CogniToy: Intelligent toys... Game Architect | for intelligent minds. ken...@co... | http://www.cognitoy.com/ _____________________________|_________________________________________ |
From: Pierre T. <p.t...@wa...> - 2000-08-18 14:00:10
|
On a related note..... I once had a very bad surprise with floats as well - this is not a problem limited to byte order. I was saving floating point values to a binary file on a PC, and using the said binary file on a DEC Alpha. I always got immediate violent crashes until I figured out the Alpha was not using IEEE floats, but a nasty hybrid format where the last 16 bits of the mantissa are actually the 16 most significant bits (due to historical compatibility reasons when they switched from 16bits to 32bits floats). In other words, this was not a byte order but a word order problem regarding floating point values :) Pierre |
From: Aaron D. <ri...@ho...> - 2000-08-18 21:38:39
|
While it doesn't really matter which order you choose so long as both sides use it, network standard byte order is bigendian. There are a number of C macro functions that convert short and long values to this format. (htons, htonl, ntohs, ntohl from memory) Floats, as mentioned by Pierre can be a bit more of a headache. I generally tend to avoid them if at all possible. If I require decimal values they've usually been of fixed precision so I've simply sent them as integer using a common divisor on both sides. > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...]On Behalf Of Kent > Quirk > Sent: Friday, August 18, 2000 11:50 PM > To: gda...@li... > Subject: Re: [Algorithms] Network & byte order. > > > > Lionel Fumery wrote: > > We are designing our network libraries, for our next games. We > would like to > > produce cross-platfom games, with misc processors targets. > > > > In the case of multi-platform network game, we wonder if we > have to consider > > the byte-ordering of the platforms... Intel is little-endian, > whereas Apple > > (Motorola) is beg-endian. > > Anybody can tell us what platforms are little-endian, or big-endian? > > > > If all our target-platforms are little-endian, we could avoid this > > byte-swaping and then keep some CPU time for something else... > > Compared to the time spent on the network, the amount of time you'll > spend byte-swapping is so microscopic as to be invisible. > > General rule of thumb: > Don't expect to write a multibyte value as a stream of binary bytes on > one platform and expect to read it in on another and have it work. > Define your formats in a way that's independent of byte order. Either > use a text value (XML, for example) or if you need to keep the data at > minimal size (and in modem-based networking you usually do) then define > your data formats at the byte level. > > Don't say: > > "The header consists of a 4-byte unsigned int packet ID." > > say: > > "The first 4 bytes of the header are a packet ID, sent as a four byte > integer, least-significant byte first." > > Then it's unambiguous what you're doing. > > With that said, I just found some comments in the header of one of the > files on our file format (called CHUFF) in MindRover. They were written > by Nat Goodspeed, who works here: > > ------------------------------------ > WARNING! For efficiency reasons, the read/write implementations for > types > such as 'bin4' are implemented by directly examining the storage used > for the > native-type variable. This is fast, but is inherently > platform-sensitive. > CHUFF data types are little-endian by definition (so that we can have > some > hope of exchanging files between different platforms). Therefore, when > you > port this implementation to a big-endian machine, make SURE you define > 'HIBYTE1ST' as one of the compiler's command-line switches! > > Our byte-swapping big-endian implementations assume that it's still > cheaper to > make a single I/O method call for the full size of the value, exchanging > bytes > using temporary variables in memory, than it is to break out separate > I/O > operations for each byte. That may not be entirely true. But one > advantage > of this scheme is that on input, we can still test for EOF on a single > call, > rather than having to test separately for each byte. > > There are two different philosophical approaches to implementing a > cross-platform binary format, that is, one such as ours, in which (for > instance) bin4 must be read and written as little-endian, regardless of > the > byte order in which the platform on which we're running normally stores > its > binary integers. > > Convert on Use > -------------- > One approach is to implement a family of classes that literally define > the way > the storage will be used. For instance, bin4 could be defined as a > class > which always contains a little-endian binary integer value. We would > then > define conversions to and from ordinary binary integers, arithmetic and > logical operations, etc., so that any operation on a bin4 object results > in a > little-endian value in memory. > > The advantage of this strategy is that such fields can apparently be > composed > into structs that describe the actual byte stream. In theory you can > then > instantiate such a struct, populate some or all of its fields and just > write > it out -- or, conversely, read the struct in its entirety (or even just > map > the struct onto part of a previously-read buffer) and then just > reference some > of its fields. > > In practice, this is complicated considerably by the need to worry about > platform-dependent struct alignment requirements. But you can still > build it, > even though you sometimes end up having to define the actual data as an > array > of bytes to bypass automatic compiler alignment. > > With this approach, you need to spend considerable development time on > each > individual field type; it must support the full suite of arithmetic and > logical operations you intend to use. Those operations are, of course, > somewhat more expensive than operations on the corresponding native > type. But > this can still be a win if: > > (a) there are very many more cross-platform structs than there are field > types. The whole rationale for this approach is that you do NOT need to > implement read/write methods for each different struct; composing such > fields > into structs should then permit the structs to be transparently used on > a byte > stream. > > (b) there are very many more fields in a typical cross-platform struct > than > you actually use. (In such a case, you might consider redesigning your > protocol, since it appears to be wasteful of space!) But if you have to > live > with a protocol definition like that, the tradeoff might work in your > favor: > with these fields, you pay for the conversion each time you use them, > but you > don't have to pay for converting fields that you don't use at all. > > (c) for some reason, you need random access to parts of a buffer. For > instance, you are filling a transmission buffer with such structs, but > the > protocol requires a header struct that describes how many other structs > follow > it, and it would be expensive or impossible to determine that number in > advance. If you have a pointer to the header struct in the buffer, you > can > simply patch the count field on the fly. > > Convert on I/O > -------------- > The other approach is to define fields that store values much like > native C++ > types, so that it's reasonably easy and cheap to perform arithmetic and > logical > operations on them, but each field knows how to serialize and > deserialize > itself to a data stream. > > Since the conversion of each field to and from a byte stream is > explicit, you > have explicit control over such things as alignment, rather than > worrying > about what the compiler might be doing behind your back. This approach > also > allows you to use C++ classes with virtual functions, which you can't do > with > a convert-on-use mechanism since the VFT pointer is part of the storage > occupied by each class object. > > The drawback is that for each struct or class you intend to write to, or > receive from, a cross-platform data stream, you must implement specific > read/write methods that enumerate all the (persistent) fields in that > struct > or class. These methods must be maintained every time you change the > set of > fields in the struct/class. > > This can be a win if: > > (a) there are relatively few predefined structs in the protocol. > Implementing > a small set of read/write methods can be easier than implementing all > the > support methods for each convert-on-use field type. > > (b) you access the fields in your structs much more often than you > de/serialize > them from/to the data stream. You only pay for conversion at the time > you > actually read or write the fields, rather than every time you touch one > of > them. > > (c) your protocol allows you to write header information and proceed, > rather > than needing to go back and revisit the header to fix up one or more of > its > fields. That is, either protocol headers don't need to make assertions > about > the data that follows, or it's relatively easy to derive that > forward-looking > information. > > I was going to say something about dynamic composition -- the case when > you > want to read or write individual fields in an order determined at > runtime > rather than at compile time -- but actually, I think that would probably > work > out equally well either way. > > In any case, we use the convert-on-I/O approach. bin4 and friends store > data > very much like C long int, etc., but they know how to read and write > themselves from/to a data stream. > > However, for internal purposes, we find it useful to borrow a > convert-on-use > notion: within this file, we implement a LittleEndian type that always > maintains data in little-endian form. > > -------------------------- > > Hope this helps. > > Kent > > -- > ----------------------------------------------------------------------- > Kent Quirk | CogniToy: Intelligent toys... > Game Architect | for intelligent minds. > ken...@co... | http://www.cognitoy.com/ > _____________________________|_________________________________________ > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list > |
From: Akbar A. <sye...@ea...> - 2000-08-18 13:58:10
|
>Anybody can tell us what platforms are little-endian, or big-endian? it's kind of hard targetting all platforms but here they are, for all it's worth. intel and dec processors are little-endian sun,sgi adn motoral proccesors are big-endian. no secret. for impl details see "network programming for microsoft windows" by ohlund and jones. i know the name is kind of odd for a book that discuss portabilty but it does. peace. akbar A. "We want technology for the sake of the story, not for its own sake. When you look back, say 10 years from now, current technology will seem quaint" Pixars' Edwin Catmull. -----Original Message----- From: gda...@li... [mailto:gda...@li...]On Behalf Of Lionel Fumery Sent: Friday, August 18, 2000 5:30 AM To: Algorithms Subject: [Algorithms] Network & byte order. Hello, (I hope my question is not too much OT) We are designing our network libraries, for our next games. We would like to produce cross-platfom games, with misc processors targets. In the case of multi-platform network game, we wonder if we have to consider the byte-ordering of the platforms... Intel is little-endian, whereas Apple (Motorola) is beg-endian. Anybody can tell us what platforms are little-endian, or big-endian? If all our target-platforms are little-endian, we could avoid this byte-swaping and then keep some CPU time for something else... Thank you for any advice ! Lionel. _______________________________________________ GDAlgorithms-list mailing list GDA...@li... http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |