Re: [Algorithms] In-place loaded data structures.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

It sounds like you should only need to do that with memory-ready data. 
For our system, it Just Works, as we can reconstruct a pointer at 
whatever size is necessary, independently of the size used to store the 
offset.

Jamie

James Milne wrote:
> This is something that I remember discussing with my colleagues when we
> were considering how to handle using our existing 32bit data files on
> 64bit Windows machines without having to ship two copies of all our data
> files.
> 
> The problem is passing around enough context information in your code so
> that you can use the offsets. For the offsets to work, you have to pass
> around the pointer that the offsets are relative to. 
> 
> Take the example of a model file. If you submit a sub-model or other
> element of that model file to your renderer, you have to pass the base
> pointer to the model in with every sub-model, so that the renderer code
> can calculate the absolute address of any data that the sub-model refers
> to.
> 
> It's probably only an extra pointer to store in your display list, and
> on architectures where you need to worry about 64bit addressing you've
> got the memory to store it anyway.
> 
> --
> Kind regards,
> James Milne
> 
> 
>>-----Original Message-----
>>From: gda...@li...
> 
> [mailto:gdalgorithms-
> 
>>lis...@li...] On Behalf Of Jamie Fowlston
>>Sent: 29 November 2005 15:05
>>To: gda...@li...
>>Subject: Re: [Algorithms] In-place loaded data structures.
>>
>>It's an interesting idea, to use mangled data completely in place. You
>>say the data had to be readable from flash memory directly by many
>>devices; could any or all of the devices write to it as well?
>>
>>Jamie
>>
>>
>>Conor Stokes wrote:
>>
>>>Coming from a different perspective on this, the last
>>>place I worked generated in place data for many (10+)
>>>different embedded device types. It generated the data
>>>on a regular as well as on demand basis (and
>>>distributed out automatically, but that's a different
>>>story).
>>>
>>>As you can imagine, these device types all used
>>>different alignment and endianess combinations. They
>>>all had to be readable from flash memory directly - no
>>>deserialisation or marshalling - as there was no time
>>>or memory to do so.
>>>
>>>We had a higher tier generation solution for this data
>>>which changed regularly and had multiple data source
>>>points (although it all came through a single database
>>>schema). Of course, the generation tier needs magic
>>>knowledge of the types and device abilities to
>>>serialise out to an "in place" target, including funky
>>>orderings (pre-sorts), endianess, alignment etc.
>>>
>>>The solution chosen was to model everything once in a
>>>case tool, write some funky code generation, one that
>>>dumped out the XML metadata, the other which dumped
>>>out headers. The metadata included autocalculation
>>>functions for sorting, creating indexes/offsets,
>>>repeats based on counts for lists, switches for type
>>>variants etc.
>>>
>>>The metadata schema was loaded, then the data yanked
>>>from the datasource, shoved through the parsed XML
>>>schema structure (using abstracted xpath targets) into
>>>an abstract serialisation layer which followed the
>>>rules for the particular device.
>>>
>>>Most of the offsets ended up just being from a base
>>>pointer from the start of the entire structure... but
>>>due to the abstract serialisation layer, the size of
>>>the offset (or the type switch for that matter ;-)
>>>could be the size of the pointer for the device you're
>>>rendering data to... so in that case you can easily
>>>pre-add in memory (you don't have to deal with rather
>>>randomly address mapped flash memory :-).
>>>
>>>Needless to say, some of this stuff is indeed overkill
>>>unless you have a lot of compile dependencies for a
>>>lot of different platforms that need to consume the
>>>same resources quickly with a minimum of fuss
>>>(especially if you need wide-area automatic updating).
>>>
>>>
>>>However, a centralised configuration data management
>>>tier is a good idea if you want to really produce
>>>content quickly, consistently and with minimal loading
>>>impact on devices.
>>>
>>>Cheers,
>>>Conor
>>>
>>>PS Yes, it makes no sense!
>>>
>>>--- Jamie Fowlston <ja...@qu...> wrote:
>>>
>>>
>>>
>>>>Sorry, turned out longer than i expected! We've
>>>>spent a lot of our R&D
>>>>over the last 8 years in this sort of area.
>>>>
>>>>Charles Nicholson wrote:
>>>>
>>>>
>>>>
>>>>>If i were better at C# or could take the time, i
>>>>
>>>>think another fine way
>>>>
>>>>
>>>>>to take my approach would be to make an assembly
>>>>
>>>>that has the data
>>>>
>>>>
>>>>>schema, like
>>>>>
>>>>>[MetaSerialize]
>>>>>class Foo
>>>>>{
>>>>>  [min(0), max(30)]
>>>>>  int x;
>>>>>
>>>>>  char c;
>>>>>};
>>>>>
>>>>>Then tools would load this assembly and use
>>>>
>>>>introspection/reflection to
>>>>
>>>>
>>>>>generate UIs and data.  A C# tool could easily
>>>>
>>>>turn one of these
>>>>
>>>>
>>>>>metadata-annotated classes to generate the C++
>>>>
>>>>runtime stuff.
>>>>
>>>>Our Q 1.x engine uses class metadata generated from
>>>>the C++ header files
>>>>to describe everything. There are a number of
>>>>problems with it:
>>>>
>>>>- The metadata (even for core classes) isn't small.
>>>>The smallest
>>>>possible data file is about 191KB. It was more like
>>>>500KB before we
>>>>optimized the metadata for size (this meant throwing
>>>>out some
>>>>functionality that might have been useful in some
>>>>circumstances, but
>>>>wasn't in practice; as we alreay knew we were
>>>>throwing the engine away,
>>>>it wasn't a problem).
>>>>- Versioning kills you. If you intend to support
>>>>anyone using existing
>>>>data, you can't change a class at all. Unless you
>>>>can afford big flag
>>>>days for everybody using your existing data format,
>>>>avoid!
>>>>
>>>>In our Q 2.0 engine, we now use these tools only for
>>>>generating API
>>>>metadata to allow script language binding, etc.
>>>>
>>>>Scott Shumaker wrote:
>>>>
>>>>
>>>>
>>>>>The biggest advantage of a 'memory-ready' format
>>>>
>>>>is speed.  It beats
>>>>
>>>>
>>>>>anything else out there by a significant margin,
>>>>
>>>>both in minimal disk reads
>>>>
>>>>
>>>>>(since you can blast large contingous blocks off
>>>>
>>>>of disk), and without
>>>>
>>>>
>>>>>requiring much processing at load-time.
>>>>
>>>>Minimal disk reads are totally orthogonal to memory
>>>>ready data formats;
>>>>all you need to do is avoid seeking. In Q 2.0,
>>>>everything is built
>>>>around abstracted data streams, so all objects are
>>>>loaded in a single
>>>>burst, and databases can be optimized so that many
>>>>objects are loaded in
>>>>a single read. This does still require some CPU
>>>>work, of course; but our
>>>>experience is that loading times are still the
>>>>killer, and if you're
>>>>touching the memory anyway to fix up pointers, it
>>>>won't hurt if you do a
>>>>little bit more work. And of course versioning is
>>>>then easy.
>>>>
>>>>Charles Nicholson wrote:
>>>>
>>>>
>>>>>Well, now that it's out there i suppose I may as
>>>>
>>>>well go on.  :)
>>>>
>>>>
>>>>>=== Does endian-ness mess it up horribly?
>>>>>
>>>>>Endianness is pretty simple to manage with this
>>>>
>>>>scheme.  The xml is
>>>>
>>>>
>>>>>ascii-encoded (i.e. endian-free), the tool runtime
>>>>
>>>>uses the native
>>>>
>>>>
>>>>>endianness, and the binary data compiler/linker
>>>>
>>>>simply respects a
>>>>
>>>>
>>>>>platform flag and writes the data for each field
>>>>
>>>>out with the correct
>>>>
>>>>
>>>>>endianness.  If you have a stream layer (that
>>>>
>>>>simply throws byte arrays
>>>>
>>>>
>>>>>at data sinks), that can be a good place to handle
>>>>
>>>>endian issues.
>>>>
>>>>We have such a stream layer in Q 2.0; not only does
>>>>it handle endianness
>>>>for you (while still allowing optimization for a
>>>>particular endianness
>>>>if you want), it also means that we can simply
>>>>change the database codec
>>>>to switch from an XML database format to a binary
>>>>format to any custom
>>>>format the user cares for.
>>>>
>>>>
>>>>
>>>>>The best i've
>>>>>come up with is that A needs to hold a handle to B
>>>>
>>>>that it can use as a
>>>>
>>>>
>>>>>key for a real pointer.  Any time A needs to
>>>>
>>>>access B it hands its
>>>>
>>>>
>>>>>B-handle to some sort of TOC for the 'real'
>>>>
>>>>address of B.  These TOCs
>>>>
>>>>
>>>>>can come up as part of the binary data though,
>>>>
>>>>unlike most of the rest
>>>>
>>>>
>>>>>of the data, they're mutable.  When new assets
>>>>
>>>>come up, these TOCs can
>>>>
>>>>
>>>>>change to hold the addresses of the new data.
>>>>
>>>>We have a similar system in place in Q 1.x and
>>>>carried over to Q 2.0;
>>>>you can hold handles to textures which don't exist
>>>>in your current
>>>>databases. If a new database is opened with them in,
>>>>they get patched up
>>>>and used appropriately. We find it more convenient
>>>>to handle this at the
>>>>object level than with a larger TOC.
>>>>
>>>>
>>>>
>>>>>Say you have a "Stranger's
>>>>>
>>>>>
>>>>>>Wrath"-style streaming game (hubs with many
>>>>
>>>>linear paths leading out and
>>>>
>>>>
>>>>>>back in) and the progression goes A -> B -> C ->
>>>>
>>>>D -> A (there's a
>>>>
>>>>
>>>>>>teleporter in D that takes you back to A).  Now
>>>>
>>>>say that the designers
>>>>
>>>>
>>>>>>want a large-memory-footprint vehicle but _only_
>>>>
>>>>in levels B and C.
>>>>
>>>>
>>>>>>Since B and C have to be in memory
>>>>
>>>>simultaneously, it's wasteful to have
>>>>
>>>>
>>>>>>2 copies of that vehicle in memory at once (one
>>>>
>>>>loaded by each level),
>>>>
>>>>
>>>>>>so you need some sort of shared overlay in which
>>>>
>>>>the resource exists and
>>>>
>>>>
>>>>>>is in memory for the duration of both B and C
>>>>
>>>>(lets call it BC).  It's
>>>>
>>>>
>>>>>>entirely conceivable that both B and C could
>>>>
>>>>refer to this vehicle  in
>>>>
>>>>
>>>>>>BC (perhaps other instances of it are laying
>>>>
>>>>around, or perhaps the
>>>>
>>>>
>>>>>>avatar has to switch vehicles, etc...), so
>>>>
>>>>there's going to have to be
>>>>
>>>>
>>>>>>some sort of dynamic linking/fixup going on when
>>>>
>>>>B and C come into memory.
>>>>
>>>>
>>>>>>A necessary conclusion of this is that assets in
>>>>
>>>>BC need some sort of
>>>>
>>>>
>>>>>>unique ID that both B and C can refer to.  Since
>>>>
>>>>all of the assets for
>>>>
>>>>
>>>>>>the game live in a database of some form or
>>>>
>>>>another (hopefully?!), it
>>>>
>>>>
>>>>>>would be nice if GUIDs could come from there.  I
>>>>
>>>>haven't looked into
>>>>
>>>>
>>>>>>this part yet, but it didn't seem on the surface
>>>>
>>>>that Perforce had any
>>>>
>>>>
>>>>>>such features (trade a filename for a unique id).
>>>>
>>>>Streaming has always been one of our main technology
>>>>targets. The
>>>>situation you describe actually isn't that hard (you
>>>>can read our
>>>>explanation of what we do in Q 1.x at
>>>>
>>>
>>>http://qdn.qubesoft.com/docs/1.1.1/doc/qserver/streaming.html);
>>>
>>>
>>>>the much
>>>>more difficult problem is what happens when a
>>>>resource which starts in a
>>>>particular location moves far away from its start,
>>>>then remains there
>>>>while you wander away and then come back. Anyway,
>>>>unique IDs are the way
>>>>we went for Q 1.x, and we've kept it (with some
>>>>modifications) for Q
>>>>2.0. On the other hand, I don't think I'd want to
>>>>tie it to some other
>>>>database's GUID.
>>>>
>>>>
>>>>
>>>>>An offline bundling tool that had global
>>>>
>>>>visibility over the entire game
>>>>
>>>>
>>>>>(i.e. across all level layout files) would be able
>>>>
>>>>to optimally organize
>>>>
>>>>
>>>>>these shared and unique packfiles.
>>>>
>>>>Exactly what we did for Q 1.x (this is the phase i
>>>>referred to earlier
>>>>which optimizes multiple object loads and for
>>>>particular platforms).
>>>>
>>>
>>>=== message truncated ===
>>>
>>>
>>>
>>>
>>>__________________________________
>>>Start your day with Yahoo! - Make it your home page!
>>>http://www.yahoo.com/r/hs
>>>
>>>
>>>-------------------------------------------------------
>>>This SF.net email is sponsored by: Splunk Inc. Do you grep through
> 
> log
> 
>>files
>>
>>>for problems?  Stop!  Download the new AJAX search engine that makes
>>>searching your log files as easy as surfing the  web.  DOWNLOAD
> 
> SPLUNK!
> 
>>>http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
>>>_______________________________________________
>>>GDAlgorithms-list mailing list
>>>GDA...@li...
>>>https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
>>>Archives:
>>>http://sourceforge.net/mailarchive/forum.php?forum_id=6188
>>
>>
>>-------------------------------------------------------
>>This SF.net email is sponsored by: Splunk Inc. Do you grep through log
>>files
>>for problems?  Stop!  Download the new AJAX search engine that makes
>>searching your log files as easy as surfing the  web.  DOWNLOAD
> 
> SPLUNK!
> 
>>http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
>>_______________________________________________
>>GDAlgorithms-list mailing list
>>GDA...@li...
>>https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
>>Archives:
>>http://sourceforge.net/mailarchive/forum.php?forum_id=6188
>>
>>______________________________________________________________________
>>This email has been scanned by the MessageLabs Email Security System.
>>For more information please visit http://www.messagelabs.com/email
>>______________________________________________________________________
> 
> 
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email 
> ______________________________________________________________________
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_ida88