I think I see - "globally consistent" ids are assigned then both the mesh and solution vectors are written out together... then those ids are "swapped" back.

Unfortunately - that "swap" is WRONG... and can in fact change the the ids for your nodes and elements in the middle of your simulation just by writing an XDA file!

If I have nodes that are numbered 500-1000 and I write an XDA file... after the mesh.write() has completed my nodes will now be numbered 0-500!  Even if my ids are zero indexed and compact (no holes in the numbering)... if I didn't add the nodes in the order of their ids then writing an XDA file will completely scramble the numbering!

Here's another issue: Why are we doing so much parallel communication of meshes in the case of Serial mesh?  Why doesn't processor zero just go through and write the mesh out exactly as it is to XDA?  Instead - there is a complicated routine that does a lot of parallel communication to push all of the pieces through processor 0 from all the other processors....

I tried to turn off the renumbering and un-renumbering - but it doesn't help in parallel because the parallel communication changes the order of things....

I'm halfway wondering if we should just invent a new format for restart files.  I don't see any way to fix all the issues with XDA and maintain backwards compatibility.

Is it time to just have a design discussion and lay out a new format all together?

Unfortunately this has thrown an enormous monkey wrench into my current project (doing perfect restart in MOOSE).  I thought I was 99% done (it all works in serial with compact meshes that are zero indexed... but everything is scrambled in parallel)... but I was counting on the XDA formats in libMesh working in a way that the perfect (as in an exact replica) mesh and EQSys showed up on the other side of a write a read... and it turns out not to be the case...


On Wed, Nov 6, 2013 at 5:34 PM, Derek Gaston <friedmud@gmail.com> wrote:
I don't even understand how the current stuff works at all.

Since the node and element numbers change after running the mesh through an XDA read and write - how do the dof ids match up with what was written out with EquationSystems::write()??


On Wed, Nov 6, 2013 at 4:41 PM, Derek Gaston <friedmud@gmail.com> wrote:
For Parallel Mesh I was just thinking that each processor would write it's own file... so that you could perfectly recreate the exact Mesh data structure on read (not too mention being more amenable to parallel filesystems like Panasas)


On Wed, Nov 6, 2013 at 4:33 PM, Kirk, Benjamin (JSC-EG311) <benjamin.kirk@nasa.gov> wrote:
On Nov 6, 2013, at 4:18 PM, Derek Gaston <friedmud@gmail.com>

> Let me be a bit more clear:
> After writing an XDA file and reading it back in - I want _exactly_ the same Mesh structure that I had to start with.... same numbering, same everything…

That should be possible…  The parallel format is loosely thought out and open to extension.  From a serial file the global ids are inferred, for the parallel case I don't see a reason we couldn't include a unique global id too.

> It should go:
> Meta Data
> Nodes
> Elements
> BCs

The idea here is we can optionally support a partition file, which defines element ownership.  This could allow the elements to be shipped off first in the case of a serial read, or read only on the processors that need them.  The important subset of nodes can then be determined.

Reading the nodes first would require caching all of them until you know which ones you can discard.  Or closing the buffer and doing some seek business.