Thread: RE: [Algorithms] In-place loaded data structures. (Page 2)
Brought to you by:
vexxed72
From: Scott S. <ssh...@na...> - 2005-11-29 05:45:13
|
We use this technique for data that's going to be loaded in game, although you do have to account for endianness and alignment issues, as well as being able to fix up various pointer types. The biggest advantage of a 'memory-ready' format is speed. It beats anything else out there by a significant margin, both in minimal disk reads (since you can blast large contingous blocks off of disk), and without requiring much processing at load-time. Naughty Dog has been using memory-ready formats for in-game data since the Crash Bandicoot days, and it's one of those critical features that makes our 'apparently seamless' worlds possible. On the other hand, this format isn't really good for passing around your tool pipeline. Here's a smattering of issues: 1) Can only serialize POD structures, not more complex types like lists, arrays, stl container types, or even anything with a vtable. 2) Very hard to debug, since there's no loading code and no validation - you just get bad data on the other size. This is especially problematic when dealing with alignment issues. If you mess something up, you just get a crash. 3) No support for versioning, backwards compatibility, or missing data (filling in default values) 4) It's actually a bit of a chore to write out, especially if your serialization interface isn't well thought-out. For example, when dealing with nested pointers to structures, you need a way to write to two different streams and merge them later on, or else it becomes pretty awkward. We actually have a completely distinct intermediate format that we pass around the tools pipeline that's far more robust. Unfortunately, designing the intermediate format wasn't a trivial task, either; we actually ended up tossing out our entire first (fully working) version because it was far too difficult to figure out what was going on underneath the hood. Scott Shumaker Lead Programmer, Naughty Dog ssh...@na... -----Original Message----- From: gda...@li... [mailto:gda...@li...] On Behalf Of Ben Garney Sent: Monday, November 28, 2005 9:05 PM To: gda...@li... Subject: Re: [Algorithms] In-place loaded data structures. Charles Nicholson wrote: > Damn. :) I found it interesting... Thanks for sharing, even if inadvertantly. :) On a more general note, is this sort of technique something that a lot of people are using in their engines? What are the downsides to this sort of data storage method? Does endian-ness mess it up horribly? It seems like with the addition of a little versioning info, you'd have a really sweet general purpose data format... Ben ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ GDAlgorithms-list mailing list GDA...@li... https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list Archives: http://sourceforge.net/mailarchive/forum.php?forum_id=6188 |
From: Charles N. <cha...@gm...> - 2005-11-29 08:05:20
|
Interesting, i hadn't found exporting containers and complex types a problem- containers look like this in the schema: <class name=3D"entity"> ... </class> <class name=3D"layout"> <field name=3D"entities" type=3D"entity" container=3D"true" /> </class> the generated code looks like this: struct EntityDataBlock { ... } struct LayoutDataBlock { EntityDataBlock** entities; int entitiesCount; } and the intermediate-format xml looks like this <Entity name=3D"Foo"> ... </Entity> <Entity name=3D"Bar"> ... </Entity> <Layout name=3D"layout"> <Entities> <element>Foo</element> <element>Bar</element> </Entities> </Layout> Admittedly, you don't end up with an STL vector, but you end up with something simpler; a fixed-size array. As far as hierarchies go, i have data inheritance up and running- the schem= a looks like this: <class name=3D"base"> <field name=3D"x" type=3D"int" /> </class> <class name=3D"child1" parent=3D"base"> <field name=3D"y" type=3D"char" /> </class> <class name=3D"child2" parent=3D"base"> <field name=3D"z" type=3D"float" /> </class> <class name=3D"BaseHolder"> <field name=3D"baz" type=3D"base"/> </class> The generated code looks like this: struct baseDataBlock { enum ChildTypes { child1 =3D 0, child2 =3D 1 }; int typeId; int x; }; void Fixup(baseDataBlock* block) { switch (typeId) { case child1: Fixup((child1DataBlock*)block); break; case child2: Fixup((child2DataBlock*)block); break; } } struct child1DataBlock : public BaseDataBlock { char y; }; void Fixup(child1DataBlock*) {} struct child2DataBlock : public BaseDataBlock { float z; }; void Fixup(child2DataBlock*) {} Children 1 & 2 aren't interesting enough to merit non-trivial fixup functions, but it's just an example; the real thing recurses. You do end u= p needing a factory on the runtime side that makes decisions (specifically, objects) based on the type (sort of a gnarly home-brew RTTI), but i haven't figured a better way around that just yet. the main point here is that you can represent non-trivial relationships between data as data. i never represent actual runtime objects in my serialization, just immutable prototype data. This prototype data is eithe= r very heavyweight (textures, model/anim data) or very lightweight (designer-customized layout data), but either way when it's time to instantiate a chunk of game play, this data is sent to various game object constructors and goes from there. Note that this makes 'quick reset' simple enough because it draws a very clear distinction between instances and definitions. The instances can go up and down as often as the player wants (or dies), but the prototype data is just one big memory block that stays up and serves as construction information for everything that's about to be instantiated. chas On 11/28/05, Scott Shumaker <ssh...@na...> wrote: > > We use this technique for data that's going to be loaded in game, althoug= h > you do have to account for endianness and alignment issues, as well as > being > able to fix up various pointer types. > > The biggest advantage of a 'memory-ready' format is speed. It beats > anything else out there by a significant margin, both in minimal disk > reads > (since you can blast large contingous blocks off of disk), and without > requiring much processing at load-time. Naughty Dog has been using > memory-ready formats for in-game data since the Crash Bandicoot days, and > it's one of those critical features that makes our 'apparently seamless' > worlds possible. > > On the other hand, this format isn't really good for passing around your > tool pipeline. Here's a smattering of issues: > 1) Can only serialize POD structures, not more complex types like lists, > arrays, stl container types, or even anything with a vtable. > 2) Very hard to debug, since there's no loading code and no validation - > you > just get bad data on the other size. This is especially problematic when > dealing with alignment issues. If you mess something up, you just get a > crash. > 3) No support for versioning, backwards compatibility, or missing data > (filling in default values) > 4) It's actually a bit of a chore to write out, especially if your > serialization interface isn't well thought-out. For example, when dealin= g > with nested pointers to structures, you need a way to write to two > different > streams and merge them later on, or else it becomes pretty awkward. > > We actually have a completely distinct intermediate format that we pass > around the tools pipeline that's far more robust. Unfortunately, > designing > the intermediate format wasn't a trivial task, either; we actually ended > up > tossing out our entire first (fully working) version because it was far > too > difficult to figure out what was going on underneath the hood. > > Scott Shumaker > Lead Programmer, Naughty Dog > ssh...@na... > > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...] On Behalf Of Ben > Garney > Sent: Monday, November 28, 2005 9:05 PM > To: gda...@li... > Subject: Re: [Algorithms] In-place loaded data structures. > > Charles Nicholson wrote: > > Damn. :) > I found it interesting... Thanks for sharing, even if inadvertantly. :) > > On a more general note, is this sort of technique something that a lot of > people are using in their engines? What are the downsides to this sort of > data storage method? Does endian-ness mess it up horribly? It seems like > with the addition of a little versioning info, you'd have a really sweet > general purpose data format... > > Ben > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 > |
From: Charles N. <cha...@gm...> - 2005-11-29 06:11:17
|
Well, now that it's out there i suppose I may as well go on. :) =3D=3D=3D Does endian-ness mess it up horribly? Endianness is pretty simple to manage with this scheme. The xml is ascii-encoded (i.e. endian-free), the tool runtime uses the native endianness, and the binary data compiler/linker simply respects a platform flag and writes the data for each field out with the correct endianness. I= f you have a stream layer (that simply throws byte arrays at data sinks), tha= t can be a good place to handle endian issues. =3D=3D=3D=3D It seems like with the addition of a little versioning info, y= ou'd have a =3D=3D=3D=3D really sweet general purpose data format... Versioning info can be maintained in the schema- whenever you change a clas= s (in the schema), you bump the version number on that class. If you have a GUI tool for editing the schema, this can be handled transparently and behind the scenes. If you add the user's name that bumped the version to the metadata on the same line as the version, like <class name=3D"Foo" version=3D"2" last-update-author=3D"cnicholson" last-update-time=3D"15:42:30" > ... </class> Then your source control mgmt software will give you a merge conflict on checkin, forcing you to deal with it (this is a good thing). We've talked about the following proposal: Any content creator can edit and save data with an old version of the schema, but a plugin to the source control will disallow any data commits unless the schema is at the head revision. This allows designers to pull down and keep working, but not commit until they'v= e fully conformed their data to the latest schema (which would hopefully happen in an automated way whenever possible). Additive schema updates (adding fields) could default to values specified i= n the metadata, like <class name=3D"Foo"> <field name=3D"z" type=3D"float" default=3D"3.141" /> ... </class> so that the data editing tools could automatically update old data. Subtractive schema updates could flag fields as 'deleted' for some predetermined amount of time, after which they would finally be obliterated= , like <class name=3D"Foo"> <field name=3D"x" type=3D"int" deleted=3D"true" /> ... </class> This way the editor would know not to allow designers to meaningfully manipulate deprecated data (maybe show the field, uneditable, in red). I should mention explicitly that neither I personally nor 'we' at work have this system up and running fully with a production team, especially with th= e versioning stuff- it's pretty hypothetical but seems reasonable enough. Th= e nature of the asset pipeline beast as i've experienced it is that there are always lots of pesky details that you don't encounter until you have > 20 content creators flailing in agony against it and making voodoo dolls that look unsettlingly like you. =3D=3D=3D=3D What are the downsides to this sort of data storage method? The main downside i've seen personally so far is the extra step it takes to get non-game-object data into the metadata-ready format. I'm using 3ds max at home for my prototype, and i'm finding myself exporting to Collada and then running an offline tool that parses the XML and spits out _my_ platform-independent XML (the same data, mind you, vertex/index buffers, material info, etc...) that's then ready for data compilation/linking. Also, if you want dynamic linking at runtime for streaming and asset hot-loading (**shameless GPG6 plug!**), things get more complicated. Say you have the case where data object A has a pointer to data object B. In the platform-ready binary data, this is most easily stored in-place (i.e. where the pointer's target will live at runtime) as a relative offset from that memory address such that fixup looks like 'a =3D &a + a;'. If you wan= t to send a new version of B up at runtime (say, a new texture that an artist has touched up) then you can no longer simply have A hold the relative location to B with this scheme. The best i've come up with is that A needs to hold a handle to B that it can use as a key for a real pointer. Any tim= e A needs to access B it hands its B-handle to some sort of TOC for the 'real= ' address of B. These TOCs can come up as part of the binary data though, unlike most of the rest of the data, they're mutable. When new assets come up, these TOCs can change to hold the addresses of the new data. The old data can be freed or even simply (and shamefully) left on-board until memor= y exhaustion or explicit unloading (devkits generally have more memory than retails). I mentioned asset hot-loading and streaming in the same breath there becaus= e they're similar problems. Say you have a "Stranger's Wrath"-style streamin= g game (hubs with many linear paths leading out and back in) and the progression goes A -> B -> C -> D -> A (there's a teleporter in D that take= s you back to A). Now say that the designers want a large-memory-footprint vehicle but _only_ in levels B and C. Since B and C have to be in memory simultaneously, it's wasteful to have 2 copies of that vehicle in memory at once (one loaded by each level), so you need some sort of shared overlay in which the resource exists and is in memory for the duration of both B and C (lets call it BC). It's entirely conceivable that both B and C could refer to this vehicle in BC (perhaps other instances of it are laying around, or perhaps the avatar has to switch vehicles, etc...), so there's going to hav= e to be some sort of dynamic linking/fixup going on when B and C come into memory. A necessary conclusion of this is that assets in BC need some sort of uniqu= e ID that both B and C can refer to. Since all of the assets for the game live in a database of some form or another (hopefully?!), it would be nice if GUIDs could come from there. I haven't looked into this part yet, but i= t didn't seem on the surface that Perforce had any such features (trade a filename for a unique id). An offline bundling tool that had global visibility over the entire game ( i.e. across all level layout files) would be able to optimally organize these shared and unique packfiles. I think that's about enough for round 2. Apologies for prattling on again. chas On 11/28/05, Ben Garney <be...@ga...> wrote: > > Charles Nicholson wrote: > > Damn. :) > I found it interesting... Thanks for sharing, even if inadvertantly. :) > > On a more general note, is this sort of technique something that a lot > of people are using in their engines? What are the downsides to this > sort of data storage method? Does endian-ness mess it up horribly? It > seems like with the addition of a little versioning info, you'd have a > really sweet general purpose data format... > > Ben > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=3D6188 > |
From: Robert B. <r....@gm...> - 2005-12-04 15:35:28
|
> I should mention explicitly that neither I personally nor 'we' at > work have this system up and running fully with a production team, > especially with the versioning stuff- it's pretty hypothetical but > seems reasonable enough. It's working quite nicely for us. All the updating voodoo happens behind the scenes - as far as most designers are concerned, they don't notice it much, > The nature of the asset pipeline beast as i've experienced it is > that there are always lots of pesky details that you don't > encounter until you have > 20 content creators flailing in agony > against it and making voodoo dolls that look unsettlingly like you. Unfortunately, the voodoo dolls still pop up... for lots of interesting reasons. I marvel at the capability of content creators to easily destroy anything ;) > The main downside i've seen personally so far is the extra step it > takes to get non-game-object data into the metadata-ready format. Ah - we've only used this for game-object data. I'd like to hear how your approach works when you hit production. > Also, if you want dynamic linking at runtime for streaming and > asset hot-loading <snip> > relative location to B with this scheme. The best i've come up > with is that A needs to hold a handle to B that it can use as a key > for a real pointer. Don't know if it'll work for you, but for us, all assets are referenced by a GUID. No pointers. > A necessary conclusion of this is that assets in BC need some sort > of unique ID that both B and C can refer to. Since all of the > assets for the game live in a database of some form or another > (hopefully?!), it would be nice if GUIDs could come from there. I > haven't looked into this part yet, but it didn't seem on the > surface that Perforce had any such features (trade a filename for a > unique id). That feature would be called a hash ;) Running a CRC on the filename is enough, if you guarantee unique filenames. If you can store metadata in the file, just assign a GUID when you create the file, and store it with the file. We do that for game- objects, since we really didn't want the designers to name every single tree in the game ;) (Allocating 32-bit GUIDs happens with perforce's help. We've got an allocation map file where you can just carve out 1K GUIDs, and then you check it back in. That let's you use up plenty of GUIDs before you need to hit perforce.) > An offline bundling tool that had global visibility over the entire > game (i.e. across all level layout files) would be able to > optimally organize these shared and unique packfiles. Any info on doing such an optimizer for a streaming game? Papers? Anything? - Robert |
From: Charles N. <cha...@gm...> - 2005-12-04 16:03:43
|
A 32-bit CRC is not enough to give you a unique key per file if you have > = a few thousand files. I thought the point of a CRC was to guarantee that the file in question has integrity. A lead programmer I worked with made the assumption that 32-bit CRC was always a unique key for a giant bucket of files, but that's incorrect. check out http://mathworld.wolfram.com/BirthdayProblem.html for the details. The 'people' in this case are the files in source control, and th= e 'birthdays' are their CRCs. As soon as you hit some critical mass of files= , you're almost guaranteed a collision. The power spectrum of the CRC32 algorithm does make it less likely, but collisions do happen. I think we ended up going to CRC64 and storing the key both backwards and forwards or some such, just enough to bump the birthday space up enough to the point where we didn't see collisions anymore. chas On 12/4/05, Robert Blum <r....@gm...> wrote: > > That feature would be called a hash ;) Running a CRC on the filename > is enough, if you guarantee unique filenames. > |
From: Robert B. <r....@gm...> - 2005-12-04 16:44:16
|
> A 32-bit CRC is not enough to give you a unique key per file if you > have > a few thousand files. We're aware of that, but thanks for pointing it out ;) There might be collisions - the key is watching for them. In the previous game, IIRC, we had exactly one collision. (Which could be solved by a rename) > I thought the point of a CRC was to guarantee that the file in > question has integrity. It's a hash like anything else. You can use it for integrity checks, or you can just hash the name for a GUID. > A lead programmer I worked with made the assumption that 32-bit CRC > was always a unique key for a giant bucket of files, but that's > incorrect. It's close enough to be usually usable. The key is having a plan B, and actually noticing when you need it ;) > I think we ended up going to CRC64 and storing the key both > backwards and forwards or some such, just enough to bump the > birthday space up enough to the point where we didn't see > collisions anymore. So that's a total of 128 bits? At that point, I'd rather switch to assigned numbers - that's a lot of space. 12 extra bytes times too many assets... Then again, I could probably get away with it. (The beauty of tools development - you can just add extra RAM :) I *am* worried about the games side, tough. For a large world, you easily have several 10K assets. Another way to extend the suffering of your CRCs is to partition your assets. One set of CRCs for models, one for textures, one for sounds.... (OK. I am not prepared to let go of them, I think ;) - Robert |
From: Mick W. <de...@mi...> - 2005-12-04 17:21:25
|
Robert Blum wrote: >> A 32-bit CRC is not enough to give you a unique key per file if you >> have > a few thousand files. > > > We're aware of that, but thanks for pointing it out ;) There might be > collisions - the key is watching for them. In the previous game, > IIRC, we had exactly one collision. (Which could be solved by a rename) > I wrote a column on this topic, just out in the December issue of Game Developer Magazine (Practical Hash IDs, page 33). You will get collisions, but very few (I got 7 collisions in a set of 216,000 strings), and renaming a few assets is pretty painless. Mick. |
From: Tom F. <tom...@ee...> - 2005-12-04 20:58:28
|
As long as you do a secondary check on CRC match (e.g. actually check the strings are equal), it's fine. You just use the CRCs as a very good hash, basically. TomF. > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...] On > Behalf Of Mick West > Sent: 04 December 2005 09:21 > To: gda...@li... > Subject: Re: [Algorithms] In-place loaded data structures. > > > Robert Blum wrote: > > >> A 32-bit CRC is not enough to give you a unique key per > file if you > >> have > a few thousand files. > > > > > > We're aware of that, but thanks for pointing it out ;) > There might be > > collisions - the key is watching for them. In the previous game, > > IIRC, we had exactly one collision. (Which could be solved > by a rename) > > > > I wrote a column on this topic, just out in the December > issue of Game > Developer Magazine (Practical Hash IDs, page 33). > > You will get collisions, but very few (I got 7 collisions in a set of > 216,000 strings), and renaming a few assets is pretty painless. > > Mick. > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep > through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. > DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=6188 > |
From: Mick W. <de...@mi...> - 2005-12-04 21:12:16
|
Tom Forsyth wrote: >As long as you do a secondary check on CRC match (e.g. actually check the >strings are equal), it's fine. You just use the CRCs as a very good hash, >basically. > > > I think you are talking about something else (a hash index to speed up the lookup process). I was refering to the use of the hash value of the filename (or object name) as a GUID for use when fixing up references between assets. For that you don't need (or want) the string, so you just ensure your CRCs are unique beforehand. Mick |
From: Tom F. <tom...@ee...> - 2005-12-04 22:31:00
|
I always used a 32-bit CRC _and_ a pointer to a string. So you can very quickly fix up references using the CRC, but when you get a match, you = then ensure it's correct with the string. We did try the alternative you suggest which is to make sure there are = no CRC collisions in the entire project, but it was a huge hassle to keep = it all in sync. The extra string data is small, and since you never = actually use it _except_ when you have a CRC match, and are therefore 99% sure = it's a match anyway, it's very quick. TomF. > -----Original Message----- > From: gda...@li...=20 > [mailto:gda...@li...] On=20 > Behalf Of Mick West > Sent: 04 December 2005 13:12 > To: gda...@li... > Subject: Re: [Algorithms] In-place loaded data structures. >=20 >=20 > Tom Forsyth wrote: >=20 > >As long as you do a secondary check on CRC match (e.g.=20 > actually check the > >strings are equal), it's fine. You just use the CRCs as a=20 > very good hash, > >basically. > > > > =20 > > > I think you are talking about something else (a hash index to=20 > speed up=20 > the lookup process). I was refering to the use of the hash=20 > value of the=20 > filename (or object name) as a GUID for use when fixing up references=20 > between assets. For that you don't need (or want) the string, so you=20 > just ensure your CRCs are unique beforehand. >=20 > Mick |
From: Robert B. <r....@gm...> - 2005-12-05 01:28:21
|
> We did try the alternative you suggest which is to make sure there > are no > CRC collisions in the entire project, but it was a huge hassle to > keep it > all in sync. How so? We let the final packing stage tell us. (We still pack though, even with duplicates) > The extra string data is small, and since you never actually > use it _except_ when you have a CRC match, and are therefore 99% > sure it's a > match anyway, it's very quick. Yes, it's more the size aspect of it. Since our filenames carry meta information, they tend to be a bit long - I'd say an average of 20 characters. Add another 4 byte for all the places where you carry the CRC and the string, and you start talking about noticeable chunks of memory. It'd probably be fine in a debug build - but we've never had a lot of problems with the CRCs. It helps that the packer generates a searchable list of all CRC->string mappings. That way, looking up strings in a debug situation is minimal hassle. - Robert |
From: Tom F. <tom...@ee...> - 2005-12-05 01:54:34
|
> How so? We let the final packing stage tell us. (We still pack =20 > though, even with duplicates) Final packing is a hassle to keep up to date all the time in a big = project. You need it for making DVD burns, but either you have to do DVD burns constantly to make sure name CRCs are unique, or you might be changing = names right at the last minute (just before gold master) to avoid collisions - test/QA is not going to like that! It's a bottleneck, and in general I don't like to have any more = bottlenecks in the process of building assets than I absolutely have to have - = there's too many that are necessary to begin adding more. We never found it a = speed or memory cost, and it was totally robust. TomF. > -----Original Message----- > From: gda...@li...=20 > [mailto:gda...@li...] On=20 > Behalf Of Robert Blum > Sent: 04 December 2005 17:28 > To: gda...@li... > Subject: Re: [Algorithms] In-place loaded data structures. >=20 >=20 > > We did try the alternative you suggest which is to make sure there =20 > > are no > > CRC collisions in the entire project, but it was a huge hassle to =20 > > keep it > > all in sync. >=20 > How so? We let the final packing stage tell us. (We still pack =20 > though, even with duplicates) >=20 > > The extra string data is small, and since you never actually > > use it _except_ when you have a CRC match, and are therefore 99% =20 > > sure it's a > > match anyway, it's very quick. >=20 > Yes, it's more the size aspect of it. Since our filenames carry meta =20 > information, they tend to be a bit long - I'd say an average of 20 =20 > characters. Add another 4 byte for all the places where you=20 > carry the =20 > CRC and the string, and you start talking about noticeable chunks of =20 > memory. >=20 > It'd probably be fine in a debug build - but we've never had=20 > a lot of =20 > problems with the CRCs. It helps that the packer generates a =20 > searchable list of all CRC->string mappings. That way, looking up =20 > strings in a debug situation is minimal hassle. >=20 > - Robert |
From: Mick W. <de...@mi...> - 2005-12-05 02:40:16
|
Tom Forsyth wrote: >Final packing is a hassle to keep up to date all the time in a big project. >You need it for making DVD burns, but either you have to do DVD burns >constantly to make sure name CRCs are unique, or you might be changing names >right at the last minute (just before gold master) to avoid collisions - >test/QA is not going to like that! > > If you are adding newly named assets just before gold master, then I think a 1 in 100,000 chance that you might have to rename that file is the least of your worries. Mick. |
From: Tom F. <tom...@ee...> - 2005-11-29 07:19:05
|
Yes, yes you would. And you'd call it a GR2 format and use it in some fabulous middleware :-) Granny does exactly this. Data is loaded, and pointers fixed up = in-place, and you're ready to rock. Endianess is a problem when you cross the big/little boundary, but you can just fix it up and save it back out, = and then future loads aren't a problem. For Granny the way we did endianess was have different marshalling types = - a "32bit" section of data, a "16bit" section, and an "8bit" section (the latter requires no endianness marshalling of course. Then you classify = each member of each structure according to its size. Note that there is a = fourth type - "any" - this is used for things like pointers that need touching = by the CPU anyway, so it doesn't matter how you marshall them, the CPU can compensate when it fixes up the address. As long as each structure is of = the same marshalling type, you can put the entire thing into one of those = three sections. So at load time, the three sections are just marshalled (i.e. = byte-swapped) in one big chunk (using SSE/MMX/Altivec/whatever). Then the pointers are fixed, and you're ready. There's one last section - the "mixed marshalling" section. This is = where structures that have mictures of 32-bit and 16-bit and 8-bit data go. Because they're of mixed type, you can't do a dumb big-block marshaller, = you have to parse each one individually and do each bit of it. But that's OK, because we also store complete type information about the file in the file. This allows you to traverse the data tree without = knowing anything about the contents beforehand. Which allows you to do mixed marhsalling, forwards and backwards compatibility, and all sorts of cute stuff like that. Before you try to do this yourself, it's cool, but pretty complex. Might = be better off buying it :-) TomF. > -----Original Message----- > From: gda...@li...=20 > [mailto:gda...@li...] On=20 > Behalf Of Ben Garney > Sent: 28 November 2005 21:05 > To: gda...@li... > Subject: Re: [Algorithms] In-place loaded data structures. >=20 >=20 > Charles Nicholson wrote: > > Damn. :) > I found it interesting... Thanks for sharing, even if=20 > inadvertantly. :) >=20 > On a more general note, is this sort of technique something=20 > that a lot=20 > of people are using in their engines? What are the downsides to this=20 > sort of data storage method? Does endian-ness mess it up horribly? It=20 > seems like with the addition of a little versioning info,=20 > you'd have a=20 > really sweet general purpose data format... >=20 > Ben |
From: Jamie F. <ja...@qu...> - 2005-11-29 11:26:16
|
Sorry, turned out longer than i expected! We've spent a lot of our R&D over the last 8 years in this sort of area. Charles Nicholson wrote: > If i were better at C# or could take the time, i think another fine way > to take my approach would be to make an assembly that has the data > schema, like > > [MetaSerialize] > class Foo > { > [min(0), max(30)] > int x; > > char c; > }; > > Then tools would load this assembly and use introspection/reflection to > generate UIs and data. A C# tool could easily turn one of these > metadata-annotated classes to generate the C++ runtime stuff. Our Q 1.x engine uses class metadata generated from the C++ header files to describe everything. There are a number of problems with it: - The metadata (even for core classes) isn't small. The smallest possible data file is about 191KB. It was more like 500KB before we optimized the metadata for size (this meant throwing out some functionality that might have been useful in some circumstances, but wasn't in practice; as we alreay knew we were throwing the engine away, it wasn't a problem). - Versioning kills you. If you intend to support anyone using existing data, you can't change a class at all. Unless you can afford big flag days for everybody using your existing data format, avoid! In our Q 2.0 engine, we now use these tools only for generating API metadata to allow script language binding, etc. Scott Shumaker wrote: > The biggest advantage of a 'memory-ready' format is speed. It beats > anything else out there by a significant margin, both in minimal disk reads > (since you can blast large contingous blocks off of disk), and without > requiring much processing at load-time. Minimal disk reads are totally orthogonal to memory ready data formats; all you need to do is avoid seeking. In Q 2.0, everything is built around abstracted data streams, so all objects are loaded in a single burst, and databases can be optimized so that many objects are loaded in a single read. This does still require some CPU work, of course; but our experience is that loading times are still the killer, and if you're touching the memory anyway to fix up pointers, it won't hurt if you do a little bit more work. And of course versioning is then easy. Charles Nicholson wrote: > Well, now that it's out there i suppose I may as well go on. :) > > === Does endian-ness mess it up horribly? > > Endianness is pretty simple to manage with this scheme. The xml is > ascii-encoded (i.e. endian-free), the tool runtime uses the native > endianness, and the binary data compiler/linker simply respects a > platform flag and writes the data for each field out with the correct > endianness. If you have a stream layer (that simply throws byte arrays > at data sinks), that can be a good place to handle endian issues. We have such a stream layer in Q 2.0; not only does it handle endianness for you (while still allowing optimization for a particular endianness if you want), it also means that we can simply change the database codec to switch from an XML database format to a binary format to any custom format the user cares for. > The best i've > come up with is that A needs to hold a handle to B that it can use as a > key for a real pointer. Any time A needs to access B it hands its > B-handle to some sort of TOC for the 'real' address of B. These TOCs > can come up as part of the binary data though, unlike most of the rest > of the data, they're mutable. When new assets come up, these TOCs can > change to hold the addresses of the new data. We have a similar system in place in Q 1.x and carried over to Q 2.0; you can hold handles to textures which don't exist in your current databases. If a new database is opened with them in, they get patched up and used appropriately. We find it more convenient to handle this at the object level than with a larger TOC. > Say you have a "Stranger's >> Wrath"-style streaming game (hubs with many linear paths leading out and >> back in) and the progression goes A -> B -> C -> D -> A (there's a >> teleporter in D that takes you back to A). Now say that the designers >> want a large-memory-footprint vehicle but _only_ in levels B and C. >> Since B and C have to be in memory simultaneously, it's wasteful to have >> 2 copies of that vehicle in memory at once (one loaded by each level), >> so you need some sort of shared overlay in which the resource exists and >> is in memory for the duration of both B and C (lets call it BC). It's >> entirely conceivable that both B and C could refer to this vehicle in >> BC (perhaps other instances of it are laying around, or perhaps the >> avatar has to switch vehicles, etc...), so there's going to have to be >> some sort of dynamic linking/fixup going on when B and C come into memory. >> >> A necessary conclusion of this is that assets in BC need some sort of >> unique ID that both B and C can refer to. Since all of the assets for >> the game live in a database of some form or another (hopefully?!), it >> would be nice if GUIDs could come from there. I haven't looked into >> this part yet, but it didn't seem on the surface that Perforce had any >> such features (trade a filename for a unique id). Streaming has always been one of our main technology targets. The situation you describe actually isn't that hard (you can read our explanation of what we do in Q 1.x at http://qdn.qubesoft.com/docs/1.1.1/doc/qserver/streaming.html); the much more difficult problem is what happens when a resource which starts in a particular location moves far away from its start, then remains there while you wander away and then come back. Anyway, unique IDs are the way we went for Q 1.x, and we've kept it (with some modifications) for Q 2.0. On the other hand, I don't think I'd want to tie it to some other database's GUID. > An offline bundling tool that had global visibility over the entire game > (i.e. across all level layout files) would be able to optimally organize > these shared and unique packfiles. Exactly what we did for Q 1.x (this is the phase i referred to earlier which optimizes multiple object loads and for particular platforms). Jamie |
From: Conor S. <bor...@ya...> - 2005-11-29 14:35:37
|
Coming from a different perspective on this, the last place I worked generated in place data for many (10+) different embedded device types. It generated the data on a regular as well as on demand basis (and distributed out automatically, but that's a different story). As you can imagine, these device types all used different alignment and endianess combinations. They all had to be readable from flash memory directly - no deserialisation or marshalling - as there was no time or memory to do so. We had a higher tier generation solution for this data which changed regularly and had multiple data source points (although it all came through a single database schema). Of course, the generation tier needs magic knowledge of the types and device abilities to serialise out to an "in place" target, including funky orderings (pre-sorts), endianess, alignment etc. The solution chosen was to model everything once in a case tool, write some funky code generation, one that dumped out the XML metadata, the other which dumped out headers. The metadata included autocalculation functions for sorting, creating indexes/offsets, repeats based on counts for lists, switches for type variants etc. The metadata schema was loaded, then the data yanked from the datasource, shoved through the parsed XML schema structure (using abstracted xpath targets) into an abstract serialisation layer which followed the rules for the particular device. Most of the offsets ended up just being from a base pointer from the start of the entire structure... but due to the abstract serialisation layer, the size of the offset (or the type switch for that matter ;-) could be the size of the pointer for the device you're rendering data to... so in that case you can easily pre-add in memory (you don't have to deal with rather randomly address mapped flash memory :-). Needless to say, some of this stuff is indeed overkill unless you have a lot of compile dependencies for a lot of different platforms that need to consume the same resources quickly with a minimum of fuss (especially if you need wide-area automatic updating). However, a centralised configuration data management tier is a good idea if you want to really produce content quickly, consistently and with minimal loading impact on devices. Cheers, Conor PS Yes, it makes no sense! --- Jamie Fowlston <ja...@qu...> wrote: > Sorry, turned out longer than i expected! We've > spent a lot of our R&D > over the last 8 years in this sort of area. > > Charles Nicholson wrote: > > > If i were better at C# or could take the time, i > think another fine way > > to take my approach would be to make an assembly > that has the data > > schema, like > > > > [MetaSerialize] > > class Foo > > { > > [min(0), max(30)] > > int x; > > > > char c; > > }; > > > > Then tools would load this assembly and use > introspection/reflection to > > generate UIs and data. A C# tool could easily > turn one of these > > metadata-annotated classes to generate the C++ > runtime stuff. > > Our Q 1.x engine uses class metadata generated from > the C++ header files > to describe everything. There are a number of > problems with it: > > - The metadata (even for core classes) isn't small. > The smallest > possible data file is about 191KB. It was more like > 500KB before we > optimized the metadata for size (this meant throwing > out some > functionality that might have been useful in some > circumstances, but > wasn't in practice; as we alreay knew we were > throwing the engine away, > it wasn't a problem). > - Versioning kills you. If you intend to support > anyone using existing > data, you can't change a class at all. Unless you > can afford big flag > days for everybody using your existing data format, > avoid! > > In our Q 2.0 engine, we now use these tools only for > generating API > metadata to allow script language binding, etc. > > Scott Shumaker wrote: > > > The biggest advantage of a 'memory-ready' format > is speed. It beats > > anything else out there by a significant margin, > both in minimal disk reads > > (since you can blast large contingous blocks off > of disk), and without > > requiring much processing at load-time. > > Minimal disk reads are totally orthogonal to memory > ready data formats; > all you need to do is avoid seeking. In Q 2.0, > everything is built > around abstracted data streams, so all objects are > loaded in a single > burst, and databases can be optimized so that many > objects are loaded in > a single read. This does still require some CPU > work, of course; but our > experience is that loading times are still the > killer, and if you're > touching the memory anyway to fix up pointers, it > won't hurt if you do a > little bit more work. And of course versioning is > then easy. > > Charles Nicholson wrote: > > Well, now that it's out there i suppose I may as > well go on. :) > > > > === Does endian-ness mess it up horribly? > > > > Endianness is pretty simple to manage with this > scheme. The xml is > > ascii-encoded (i.e. endian-free), the tool runtime > uses the native > > endianness, and the binary data compiler/linker > simply respects a > > platform flag and writes the data for each field > out with the correct > > endianness. If you have a stream layer (that > simply throws byte arrays > > at data sinks), that can be a good place to handle > endian issues. > > We have such a stream layer in Q 2.0; not only does > it handle endianness > for you (while still allowing optimization for a > particular endianness > if you want), it also means that we can simply > change the database codec > to switch from an XML database format to a binary > format to any custom > format the user cares for. > > > The best i've > > come up with is that A needs to hold a handle to B > that it can use as a > > key for a real pointer. Any time A needs to > access B it hands its > > B-handle to some sort of TOC for the 'real' > address of B. These TOCs > > can come up as part of the binary data though, > unlike most of the rest > > of the data, they're mutable. When new assets > come up, these TOCs can > > change to hold the addresses of the new data. > > We have a similar system in place in Q 1.x and > carried over to Q 2.0; > you can hold handles to textures which don't exist > in your current > databases. If a new database is opened with them in, > they get patched up > and used appropriately. We find it more convenient > to handle this at the > object level than with a larger TOC. > > > Say you have a "Stranger's > >> Wrath"-style streaming game (hubs with many > linear paths leading out and > >> back in) and the progression goes A -> B -> C -> > D -> A (there's a > >> teleporter in D that takes you back to A). Now > say that the designers > >> want a large-memory-footprint vehicle but _only_ > in levels B and C. > >> Since B and C have to be in memory > simultaneously, it's wasteful to have > >> 2 copies of that vehicle in memory at once (one > loaded by each level), > >> so you need some sort of shared overlay in which > the resource exists and > >> is in memory for the duration of both B and C > (lets call it BC). It's > >> entirely conceivable that both B and C could > refer to this vehicle in > >> BC (perhaps other instances of it are laying > around, or perhaps the > >> avatar has to switch vehicles, etc...), so > there's going to have to be > >> some sort of dynamic linking/fixup going on when > B and C come into memory. > >> > >> A necessary conclusion of this is that assets in > BC need some sort of > >> unique ID that both B and C can refer to. Since > all of the assets for > >> the game live in a database of some form or > another (hopefully?!), it > >> would be nice if GUIDs could come from there. I > haven't looked into > >> this part yet, but it didn't seem on the surface > that Perforce had any > >> such features (trade a filename for a unique id). > > Streaming has always been one of our main technology > targets. The > situation you describe actually isn't that hard (you > can read our > explanation of what we do in Q 1.x at > http://qdn.qubesoft.com/docs/1.1.1/doc/qserver/streaming.html); > the much > more difficult problem is what happens when a > resource which starts in a > particular location moves far away from its start, > then remains there > while you wander away and then come back. Anyway, > unique IDs are the way > we went for Q 1.x, and we've kept it (with some > modifications) for Q > 2.0. On the other hand, I don't think I'd want to > tie it to some other > database's GUID. > > > An offline bundling tool that had global > visibility over the entire game > > (i.e. across all level layout files) would be able > to optimally organize > > these shared and unique packfiles. > > Exactly what we did for Q 1.x (this is the phase i > referred to earlier > which optimizes multiple object loads and for > particular platforms). > === message truncated === __________________________________ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs |
From: Jamie F. <ja...@qu...> - 2005-11-29 15:05:28
|
It's an interesting idea, to use mangled data completely in place. You say the data had to be readable from flash memory directly by many devices; could any or all of the devices write to it as well? Jamie Conor Stokes wrote: > Coming from a different perspective on this, the last > place I worked generated in place data for many (10+) > different embedded device types. It generated the data > on a regular as well as on demand basis (and > distributed out automatically, but that's a different > story). > > As you can imagine, these device types all used > different alignment and endianess combinations. They > all had to be readable from flash memory directly - no > deserialisation or marshalling - as there was no time > or memory to do so. > > We had a higher tier generation solution for this data > which changed regularly and had multiple data source > points (although it all came through a single database > schema). Of course, the generation tier needs magic > knowledge of the types and device abilities to > serialise out to an "in place" target, including funky > orderings (pre-sorts), endianess, alignment etc. > > The solution chosen was to model everything once in a > case tool, write some funky code generation, one that > dumped out the XML metadata, the other which dumped > out headers. The metadata included autocalculation > functions for sorting, creating indexes/offsets, > repeats based on counts for lists, switches for type > variants etc. > > The metadata schema was loaded, then the data yanked > from the datasource, shoved through the parsed XML > schema structure (using abstracted xpath targets) into > an abstract serialisation layer which followed the > rules for the particular device. > > Most of the offsets ended up just being from a base > pointer from the start of the entire structure... but > due to the abstract serialisation layer, the size of > the offset (or the type switch for that matter ;-) > could be the size of the pointer for the device you're > rendering data to... so in that case you can easily > pre-add in memory (you don't have to deal with rather > randomly address mapped flash memory :-). > > Needless to say, some of this stuff is indeed overkill > unless you have a lot of compile dependencies for a > lot of different platforms that need to consume the > same resources quickly with a minimum of fuss > (especially if you need wide-area automatic updating). > > > However, a centralised configuration data management > tier is a good idea if you want to really produce > content quickly, consistently and with minimal loading > impact on devices. > > Cheers, > Conor > > PS Yes, it makes no sense! > > --- Jamie Fowlston <ja...@qu...> wrote: > > >>Sorry, turned out longer than i expected! We've >>spent a lot of our R&D >>over the last 8 years in this sort of area. >> >>Charles Nicholson wrote: >> >> >>>If i were better at C# or could take the time, i >> >>think another fine way >> >>>to take my approach would be to make an assembly >> >>that has the data >> >>>schema, like >>> >>>[MetaSerialize] >>>class Foo >>>{ >>> [min(0), max(30)] >>> int x; >>> >>> char c; >>>}; >>> >>>Then tools would load this assembly and use >> >>introspection/reflection to >> >>>generate UIs and data. A C# tool could easily >> >>turn one of these >> >>>metadata-annotated classes to generate the C++ >> >>runtime stuff. >> >>Our Q 1.x engine uses class metadata generated from >>the C++ header files >>to describe everything. There are a number of >>problems with it: >> >>- The metadata (even for core classes) isn't small. >>The smallest >>possible data file is about 191KB. It was more like >>500KB before we >>optimized the metadata for size (this meant throwing >>out some >>functionality that might have been useful in some >>circumstances, but >>wasn't in practice; as we alreay knew we were >>throwing the engine away, >>it wasn't a problem). >>- Versioning kills you. If you intend to support >>anyone using existing >>data, you can't change a class at all. Unless you >>can afford big flag >>days for everybody using your existing data format, >>avoid! >> >>In our Q 2.0 engine, we now use these tools only for >>generating API >>metadata to allow script language binding, etc. >> >>Scott Shumaker wrote: >> >> >>>The biggest advantage of a 'memory-ready' format >> >>is speed. It beats >> >>>anything else out there by a significant margin, >> >>both in minimal disk reads >> >>>(since you can blast large contingous blocks off >> >>of disk), and without >> >>>requiring much processing at load-time. >> >>Minimal disk reads are totally orthogonal to memory >>ready data formats; >>all you need to do is avoid seeking. In Q 2.0, >>everything is built >>around abstracted data streams, so all objects are >>loaded in a single >>burst, and databases can be optimized so that many >>objects are loaded in >>a single read. This does still require some CPU >>work, of course; but our >>experience is that loading times are still the >>killer, and if you're >>touching the memory anyway to fix up pointers, it >>won't hurt if you do a >>little bit more work. And of course versioning is >>then easy. >> >>Charles Nicholson wrote: >> >>>Well, now that it's out there i suppose I may as >> >>well go on. :) >> >>>=== Does endian-ness mess it up horribly? >>> >>>Endianness is pretty simple to manage with this >> >>scheme. The xml is >> >>>ascii-encoded (i.e. endian-free), the tool runtime >> >>uses the native >> >>>endianness, and the binary data compiler/linker >> >>simply respects a >> >>>platform flag and writes the data for each field >> >>out with the correct >> >>>endianness. If you have a stream layer (that >> >>simply throws byte arrays >> >>>at data sinks), that can be a good place to handle >> >>endian issues. >> >>We have such a stream layer in Q 2.0; not only does >>it handle endianness >>for you (while still allowing optimization for a >>particular endianness >>if you want), it also means that we can simply >>change the database codec >>to switch from an XML database format to a binary >>format to any custom >>format the user cares for. >> >> >>>The best i've >>>come up with is that A needs to hold a handle to B >> >>that it can use as a >> >>>key for a real pointer. Any time A needs to >> >>access B it hands its >> >>>B-handle to some sort of TOC for the 'real' >> >>address of B. These TOCs >> >>>can come up as part of the binary data though, >> >>unlike most of the rest >> >>>of the data, they're mutable. When new assets >> >>come up, these TOCs can >> >>>change to hold the addresses of the new data. >> >>We have a similar system in place in Q 1.x and >>carried over to Q 2.0; >>you can hold handles to textures which don't exist >>in your current >>databases. If a new database is opened with them in, >>they get patched up >>and used appropriately. We find it more convenient >>to handle this at the >>object level than with a larger TOC. >> >> >>>Say you have a "Stranger's >>> >>>>Wrath"-style streaming game (hubs with many >> >>linear paths leading out and >> >>>>back in) and the progression goes A -> B -> C -> >> >>D -> A (there's a >> >>>>teleporter in D that takes you back to A). Now >> >>say that the designers >> >>>>want a large-memory-footprint vehicle but _only_ >> >>in levels B and C. >> >>>>Since B and C have to be in memory >> >>simultaneously, it's wasteful to have >> >>>>2 copies of that vehicle in memory at once (one >> >>loaded by each level), >> >>>>so you need some sort of shared overlay in which >> >>the resource exists and >> >>>>is in memory for the duration of both B and C >> >>(lets call it BC). It's >> >>>>entirely conceivable that both B and C could >> >>refer to this vehicle in >> >>>>BC (perhaps other instances of it are laying >> >>around, or perhaps the >> >>>>avatar has to switch vehicles, etc...), so >> >>there's going to have to be >> >>>>some sort of dynamic linking/fixup going on when >> >>B and C come into memory. >> >>>>A necessary conclusion of this is that assets in >> >>BC need some sort of >> >>>>unique ID that both B and C can refer to. Since >> >>all of the assets for >> >>>>the game live in a database of some form or >> >>another (hopefully?!), it >> >>>>would be nice if GUIDs could come from there. I >> >>haven't looked into >> >>>>this part yet, but it didn't seem on the surface >> >>that Perforce had any >> >>>>such features (trade a filename for a unique id). >> >>Streaming has always been one of our main technology >>targets. The >>situation you describe actually isn't that hard (you >>can read our >>explanation of what we do in Q 1.x at >> > > http://qdn.qubesoft.com/docs/1.1.1/doc/qserver/streaming.html); > >>the much >>more difficult problem is what happens when a >>resource which starts in a >>particular location moves far away from its start, >>then remains there >>while you wander away and then come back. Anyway, >>unique IDs are the way >>we went for Q 1.x, and we've kept it (with some >>modifications) for Q >>2.0. On the other hand, I don't think I'd want to >>tie it to some other >>database's GUID. >> >> >>>An offline bundling tool that had global >> >>visibility over the entire game >> >>>(i.e. across all level layout files) would be able >> >>to optimally organize >> >>>these shared and unique packfiles. >> >>Exactly what we did for Q 1.x (this is the phase i >>referred to earlier >>which optimizes multiple object loads and for >>particular platforms). >> > > === message truncated === > > > > > __________________________________ > Start your day with Yahoo! - Make it your home page! > http://www.yahoo.com/r/hs > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_id=6188 |
From: Conor S. <bor...@ya...> - 2005-11-29 16:12:00
|
They could - but the latency for lots of random access writes was very high... and the devices were high availability. The operational time constraints meant that for this kind of data it was impractical. Once the stuff was downloaded, it had to be used (you couldn't have a minute or two of "please wait" once the download was finished). It wouldn't be too hard to transform the offsets in to pointers if there was an option of a long loading phase (see with regards to using offsets that follow pointer size/alignment rules). Usually only data persisted to go the other way was written (this was written sequentially in batches). One neat trick to get mangled data in place being very "friendly" to deal with is to move to a file/payload structure (even if it all comes from one "block") and have each file/payload represent one collection of structures that will expand completely in place (this can include fixed/padded arrays). Then you can use your set of collection pointers + the fact you know where all your flat offsets are (or indices into collections) and what payload they target (easy enough to make "static" information) to transform them into pointers with a minimum of fuss. Of course... you can have your collections in presorted order (sometimes you need to search for things). Then you get into the wonders of what payloads are dependent on what, so you can keep some payloads in scope (i.e. one payload lasts the whole time) while loading others on demand (one payload may be level specific scope... just do it's load sequence again). Circular dependencies in payloads are bad mojo :-). If you have a content distribution system/content patches, you can imagine how this structure could make your life a hell of a lot easier. Truth be told, I would recommend using C++ classes + a server/client class hierarchy over a metadata based solution. In the end it is easier to write a nice abstracted serialisation layer and not have to wrap that up in YAL (yet another language :-). The nice thing about the server/client class system is you only need the client classes on the actual running box, the server code stays completely in the back end. Cheers, Conor --- Jamie Fowlston <ja...@qu...> wrote: > It's an interesting idea, to use mangled data > completely in place. You > say the data had to be readable from flash memory > directly by many > devices; could any or all of the devices write to it > as well? > > Jamie > > > Conor Stokes wrote: > > Coming from a different perspective on this, the > last > > place I worked generated in place data for many > (10+) > > different embedded device types. It generated the > data > > on a regular as well as on demand basis (and > > distributed out automatically, but that's a > different > > story). > > > > As you can imagine, these device types all used > > different alignment and endianess combinations. > They > > all had to be readable from flash memory directly > - no > > deserialisation or marshalling - as there was no > time > > or memory to do so. > > > > We had a higher tier generation solution for this > data > > which changed regularly and had multiple data > source > > points (although it all came through a single > database > > schema). Of course, the generation tier needs > magic > > knowledge of the types and device abilities to > > serialise out to an "in place" target, including > funky > > orderings (pre-sorts), endianess, alignment etc. > > > > The solution chosen was to model everything once > in a > > case tool, write some funky code generation, one > that > > dumped out the XML metadata, the other which > dumped > > out headers. The metadata included autocalculation > > functions for sorting, creating indexes/offsets, > > repeats based on counts for lists, switches for > type > > variants etc. > > > > The metadata schema was loaded, then the data > yanked > > from the datasource, shoved through the parsed XML > > schema structure (using abstracted xpath targets) > into > > an abstract serialisation layer which followed the > > rules for the particular device. > > > > Most of the offsets ended up just being from a > base > > pointer from the start of the entire structure... > but > > due to the abstract serialisation layer, the size > of > > the offset (or the type switch for that matter ;-) > > could be the size of the pointer for the device > you're > > rendering data to... so in that case you can > easily > > pre-add in memory (you don't have to deal with > rather > > randomly address mapped flash memory :-). > > > > Needless to say, some of this stuff is indeed > overkill > > unless you have a lot of compile dependencies for > a > > lot of different platforms that need to consume > the > > same resources quickly with a minimum of fuss > > (especially if you need wide-area automatic > updating). > > > > > > However, a centralised configuration data > management > > tier is a good idea if you want to really produce > > content quickly, consistently and with minimal > loading > > impact on devices. > > > > Cheers, > > Conor > > > > PS Yes, it makes no sense! > > > > --- Jamie Fowlston <ja...@qu...> wrote: > > > > > >>Sorry, turned out longer than i expected! We've > >>spent a lot of our R&D > >>over the last 8 years in this sort of area. > >> > >>Charles Nicholson wrote: > >> > >> > >>>If i were better at C# or could take the time, i > >> > >>think another fine way > >> > >>>to take my approach would be to make an assembly > >> > >>that has the data > >> > >>>schema, like > >>> > >>>[MetaSerialize] > >>>class Foo > >>>{ > >>> [min(0), max(30)] > >>> int x; > >>> > >>> char c; > >>>}; > >>> > >>>Then tools would load this assembly and use > >> > >>introspection/reflection to > >> > >>>generate UIs and data. A C# tool could easily > >> > >>turn one of these > >> > >>>metadata-annotated classes to generate the C++ > >> > >>runtime stuff. > >> > >>Our Q 1.x engine uses class metadata generated > from > >>the C++ header files > >>to describe everything. There are a number of > >>problems with it: > >> > >>- The metadata (even for core classes) isn't > small. > >>The smallest > >>possible data file is about 191KB. It was more > like > >>500KB before we > >>optimized the metadata for size (this meant > throwing > >>out some > >>functionality that might have been useful in some > >>circumstances, but > >>wasn't in practice; as we alreay knew we were > >>throwing the engine away, > >>it wasn't a problem). > >>- Versioning kills you. If you intend to support > >>anyone using existing > >>data, you can't change a class at all. Unless you > >>can afford big flag > >>days for everybody using your existing data > format, > >>avoid! > >> > >>In our Q 2.0 engine, we now use these tools only > for > >>generating API > >>metadata to allow script language binding, etc. > >> > >>Scott Shumaker wrote: > >> > >> > >>>The biggest advantage of a 'memory-ready' format > >> > >>is speed. It beats > >> > >>>anything else out there by a significant margin, > >> > >>both in minimal disk reads > >> > >>>(since you can blast large contingous blocks off > >> > >>of disk), and without > >> > >>>requiring much processing at load-time. > >> > >>Minimal disk reads are totally orthogonal to > memory > >>ready data formats; > >>all you need to do is avoid seeking. In Q 2.0, > >>everything is built > >>around abstracted data streams, so all objects are > >>loaded in a single > >>burst, and databases can be optimized so that many > === message truncated === __________________________________ Yahoo! Music Unlimited Access over 1 million songs. Try it free. http://music.yahoo.com/unlimited/ |
From: Jamie F. <ja...@qu...> - 2005-11-29 17:25:48
|
Conor Stokes wrote: > It wouldn't be too hard to transform the offsets in to > pointers if there was an option of a long loading > phase (see with regards to using offsets that follow > pointer size/alignment rules). > > Usually only data persisted to go the other way was > written (this was written sequentially in batches). > > One neat trick to get mangled data in place being very > "friendly" to deal with is to move to a file/payload > structure (even if it all comes from one "block") and > have each file/payload represent one collection of > structures that will expand completely in place (this > can include fixed/padded arrays). > > Then you can use your set of collection pointers + the > fact you know where all your flat offsets are (or > indices into collections) and what payload they target > (easy enough to make "static" information) to > transform them into pointers with a minimum of fuss. > Of course... you can have your collections in > presorted order (sometimes you need to search for > things). > > Then you get into the wonders of what payloads are > dependent on what, so you can keep some payloads in > scope (i.e. one payload lasts the whole time) while > loading others on demand (one payload may be level > specific scope... just do it's load sequence again). > Circular dependencies in payloads are bad mojo :-). > > If you have a content distribution system/content > patches, you can imagine how this structure could make > your life a hell of a lot easier. Yup, but this is all back into standard streaming land. I like the idea (from a geek point of view :) of using a seriously mangled data format, but can't see any real practical use for it unless it's static data that you only (usually) need to read or performance isn't an issue. > > Truth be told, I would recommend using C++ classes + a > server/client class hierarchy over a metadata based > solution. In the end it is easier to write a nice > abstracted serialisation layer and not have to wrap > that up in YAL (yet another language :-). > I certainly agree with you here. Things have got so much saner since we moved away from metadata. Jamie |
From: Alen L. <ale...@cr...> - 2005-11-29 16:41:52
|
> - The metadata (even for core classes) isn't small. The smallest possible > data file is about 191KB. It was more like 500KB before we optimized the > metadata for size (this meant throwing out some functionality that might > have been useful in some circumstances, but wasn't in practice; as we > alreay knew we were throwing the engine away, it wasn't a problem). Hm, I'm not sure why are you having any problems with this? We're using metadata for almost all content, from small config files to meshes, textures, animations and levels. (Sounds and video files are standard formats - wav, ogg....). Saved file overhead scales with the number of different types and smallest files are <1kb. > - Versioning kills you. If you intend to support anyone using existing > data, you can't change a class at all. Unless you can afford big flag days > for everybody using your existing data format, avoid! We maintain enough info to be able to load old versions, do little/big endian conversions etc. You can add/remove/rename members of classes or add/remove classes in inheritance lineage and it can still load correctly. Also, I'd argue that load time impact is only on the CPU usage during loading, not on the actual loading itself. We maintain loading rates very similar to highest theoretical when loading from DVDs. For things that have large blobs of data (vertex arrays or textures), the system automatically detects when loading an array of PODTs and just slaps the raw data directly into memory. Just my 0.02c, Alen |
From: Jamie F. <ja...@qu...> - 2005-11-29 17:01:00
|
Alen Ladavac wrote: >> - The metadata (even for core classes) isn't small. The smallest >> possible data file is about 191KB. It was more like 500KB before we >> optimized the metadata for size (this meant throwing out some >> functionality that might have been useful in some circumstances, but >> wasn't in practice; as we alreay knew we were throwing the engine >> away, it wasn't a problem). > > > Hm, I'm not sure why are you having any problems with this? We're using > metadata for almost all content, from small config files to meshes, > textures, animations and levels. (Sounds and video files are standard > formats - wav, ogg....). Saved file overhead scales with the number of > different types and smallest files are <1kb. Well, smallest size for us means including the metadata for all the objects. Sure, you could trim some out for limited applications, but in practice for any meaningful application you'd be using almost all the data types. > >> - Versioning kills you. If you intend to support anyone using existing >> data, you can't change a class at all. Unless you can afford big flag >> days for everybody using your existing data format, avoid! > > > We maintain enough info to be able to load old versions, do little/big > endian conversions etc. You can add/remove/rename members of classes or > add/remove classes in inheritance lineage and it can still load correctly. > You certainly can do this; I deliberately picked a particular implementation in the OP which looked like it was tying the metadata storage tightly to the runtime implementation, which is a big mistake. Even with that tight binding, there's still room to deal with endianness, but not class member changes. > Also, I'd argue that load time impact is only on the CPU usage during > loading, not on the actual loading itself. We maintain loading rates > very similar to highest theoretical when loading from DVDs. Yes, if you're not achieving this something is broken. > For things > that have large blobs of data (vertex arrays or textures), the system > automatically detects when loading an array of PODTs and just slaps the > raw data directly into memory. Absolutely; it's all about having the appropriate primitive data types supported in your loading code, and big blob of data has to be one of them :) Jamie |
From: Alen L. <ale...@cr...> - 2005-11-30 08:57:19
|
>> Hm, I'm not sure why are you having any problems with this? We're using >> metadata for almost all content, from small config files to meshes, >> textures, animations and levels. (Sounds and video files are standard >> formats - wav, ogg....). Saved file overhead scales with the number of >> different types and smallest files are <1kb. > > Well, smallest size for us means including the metadata for all the > objects. Sure, you could trim some out for limited applications, but in > practice for any meaningful application you'd be using almost all the data > types. Yes, that's what I said. The smallest file I was able to found is 960 bytes, together with all metadata for all objects contained in that file. That's just a set of parameters for a weapon. Perhaps we are talking about a slightly different thing. We do not store metadata for each object in the file, but only once for each _datatype_ that has an object in this file. So e.g. if you have a model that has materials in it, the CMaterial datatype is stored only once. All this info is stored in the file's header, and the rest of the file is binary data just as if you were using some fixed-format data file. It's the header that describes the objects. > Even with that tight binding, there's still room to deal with endianness, > but not class member changes. With the above types-in-the-header approach, there is. >> For things that have large blobs of data (vertex arrays or textures), the >> system automatically detects when loading an array of PODTs and just >> slaps the raw data directly into memory. > > Absolutely; it's all about having the appropriate primitive data types > supported in your loading code, and big blob of data has to be one of them > :) Actually, you don't need to have specific BLOB datatypes. Any "simple type" (int, float, enum...) is considered "raw" if the files endianness matches that of the machine that loads the file. Then the "raw" property propagates up the type hierarchy and you get things like vertex arrays automatically detected as blobs. This is all done at load time, but is pretty low overhead. HTH, Alen |
From: Jamie F. <ja...@qu...> - 2005-11-30 10:29:27
|
Alen Ladavac wrote: >>> Hm, I'm not sure why are you having any problems with this? We're >>> using metadata for almost all content, from small config files to >>> meshes, textures, animations and levels. (Sounds and video files are >>> standard formats - wav, ogg....). Saved file overhead scales with the >>> number of different types and smallest files are <1kb. >> >> >> Well, smallest size for us means including the metadata for all the >> objects. Sure, you could trim some out for limited applications, but >> in practice for any meaningful application you'd be using almost all >> the data types. > > > Yes, that's what I said. The smallest file I was able to found is 960 > bytes, together with all metadata for all objects contained in that > file. That's just a set of parameters for a weapon. > > Perhaps we are talking about a slightly different thing. We do not store > metadata for each object in the file, but only once for each _datatype_ > that has an object in this file. So e.g. if you have a model that has > materials in it, the CMaterial datatype is stored only once. All this > info is stored in the file's header, and the rest of the file is binary > data just as if you were using some fixed-format data file. It's the > header that describes the objects. I think we must be talking about different things. We also only store metadata once for each datatype in Q 1.x, but I'm talking about the smallest file size which contains all the metadata for all the objects that might be stored in the file, not just all the objects that are stored in it, i.e. it's the total size of all metadata for all objects. Typically, that's the overhead you'll have, as you will be using most object types. > >> Even with that tight binding, there's still room to deal with >> endianness, but not class member changes. > > > With the above types-in-the-header approach, there is. > I think you've missed my point; of course you can use a metadata system with versioning. I'm saying it's a bad idea to have tight coupling between the in-memory data class structure and the in-file data class structure that doesn't allow versioning. It looked like that was happening in the specific part of the OP that I was referring to. > >>> For things that have large blobs of data (vertex arrays or textures), >>> the system automatically detects when loading an array of PODTs and >>> just slaps the raw data directly into memory. >> >> >> Absolutely; it's all about having the appropriate primitive data types >> supported in your loading code, and big blob of data has to be one of >> them :) > > > Actually, you don't need to have specific BLOB datatypes. Any "simple > type" (int, float, enum...) is considered "raw" if the files endianness > matches that of the machine that loads the file. Then the "raw" property > propagates up the type hierarchy and you get things like vertex arrays > automatically detected as blobs. This is all done at load time, but is > pretty low overhead. Yes, you could do that. Once you've got to this stage of things, it's pretty easy to pick basic data types that are appropriate for you. A vertex array would be unlikely to be stored as a blob anyway (unless compressed into a binary stream), as the coordinates will likely suffer from endianness issues across platforms. Jamie |
From: Alen L. <ale...@cr...> - 2005-11-30 11:02:53
|
> vertex array would be unlikely to be stored as a blob anyway (unless > compressed into a binary stream), as the coordinates will likely suffer > from endianness issues across platforms. But I said... >> type" (int, float, enum...) is considered "raw" if the files endianness >> matches that of the machine that loads the file. If the file has proper endianness (and passes other architecture checks), then vertex array can be considered a blob. If not then it is loaded normally. But finalization process during automated builds makes sure each file is converted to proper endianness (and other adjustments done if needed) for desired target platform(s). Re binary streams, that's the only thing we use anyway. All files are binary, with proper metadata headers. Text formats are only used for debugging. Alen |
From: Jamie F. <ja...@qu...> - 2005-11-30 13:12:11
|
Alen Ladavac wrote: >> vertex array would be unlikely to be stored as a blob anyway (unless >> compressed into a binary stream), as the coordinates will likely >> suffer from endianness issues across platforms. > > > But I said... > >>> type" (int, float, enum...) is considered "raw" if the files >>> endianness matches that of the machine that loads the file. > > > If the file has proper endianness (and passes other architecture > checks), then vertex array can be considered a blob. If not then it is > loaded normally. Absolutely, I understood that. That still means that the data is not stored in the file as a blob, you just know that you can load it all as if it were a blob if the endianness and size criteria are met. I'm simply pointing out that you typically _can't_ store a vertex array as a blob because of the endianness issues. > But finalization process during automated builds makes > sure each file is converted to proper endianness (and other adjustments > done if needed) for desired target platform(s). Yup, this is also important. > Re binary streams, that's the only thing we use anyway. All files are > binary, with proper metadata headers. Text formats are only used for > debugging. We typically only use the text formats for debugging too, but they're there if people want them. It does happen :) Jamie |