On Thu, 19 Nov 2009, Kirk, Benjamin (JSC-EG311) wrote:
> To be clear, ex10 on two processors will do it?
I was testing on 4, but from what I saw in Marro's test case, *one*
processor will do it. It just takes one or two adaptive steps to
trigger it too; I think I changed n_timesteps from 25 to 5 in ex10.
Like I said, something is *seriously* wrong with our .xda I/O now...
> ----- Original Message -----
> From: Roy Stogner <roystgnr@...>
> To: libmesh-devel@... <libmesh-devel@...>
> Sent: Thu Nov 19 19:01:59 2009
> Subject: Re: [Libmesh-devel] ex4 Parallel run error.
> On Thu, 19 Nov 2009, Kirk, Benjamin (JSC-EG311) wrote:
>> Are you looking back by date to find the last successful build?
> Unfortunately I'm now at home with errands to do; I didn't discover
> until 6:00 that the I/O problem was so serious.
> Playing binary search through svn revision numbers is definitely the
> next line of attack, but I may not be able to get to it until ~8am
> tomorrow. If anyone has a chance to try that first I'd appreciate it.
>> ----- Original Message -----
>> From: Roy Stogner <roystgnr@...>
>> To: Vijay S. Mahadevan <vijay.m@...>
>> Cc: libmesh-devel <libmesh-devel@...>
>> Sent: Thu Nov 19 18:06:32 2009
>> Subject: Re: [Libmesh-devel] ex4 Parallel run error.
>> On Thu, 19 Nov 2009, Vijay S. Mahadevan wrote:
>>> yes, that fixed the problem in ex4. I'll compile my code and test it
>>> later today and will let you know if there are any other problems.
>> There are definitely other problems. There appears to be an I/O
>> regression that's unrelated to the libHilbert change. I'm still
>> trying to track that down, but it's nasty - whereas the libHilbert bug
>> was only affecting a couple codes, this I/O regression even triggers
>> on ex10...
>>> On Thu, Nov 19, 2009 at 5:25 PM, Roy Stogner <roystgnr@...>
>>>> On Thu, 19 Nov 2009, Vijay S. Mahadevan wrote:
>>>>> Ah I see. I do not use AMR extensively these days and so most of my
>>>>> test problems have been working smoothly without encountering the
>>>>> error you mentioned. I assumed this was a very specific bug that
>>>>> occurred only after several refinements. If this is the case, can I
>>>>> fall back to enabling Hilbert library ? Or would it be safer to wait
>>>>> for your implementation to be checked in before I start running in
>>>>> parallel ?
>>>> My implementation's checked in now. Whether it's safer to trust code
>>>> that hasn't been thoroughly tested or code that has been found to fail
>>>> in a few hard-to-find cases is a matter of opinion. I'd appreciate it
>>>> if you'd run with the new implementation, though, so that we start
>>>> getting that more thoroughly tested ASAP. That seems like the fastest
>>>> way to get back to "safe". The libHilbert issue appeared to be a
>>>> problem in their code, which either means that we have to fix their
>>>> code or that we've misunderstood the problem; either way it may take
>>>> us a while to make it safe to turn that back on.