From: Jan B. <bie...@tu...> - 2009-09-17 16:46:20
Attachments:
biermann.vcf
|
Dear all, I have been using an old libmesh version until now because I didn't want to bother changing all my application code. But now the problem size becomes this big that I cannot get around the latest version (because I think you changed the mesh class for parallel computing, so not storing the entire mesh on every processor,right?) But now that I want to install the latest version, I ran into some bugs where the code didn't want to compile for complex numbers but after fixing it, I finally get an error that I don't understand: .... ---------------------------------------------- ----- Done Building Contributed Packages ----- ---------------------------------------------- make[1]: Leaving directory `/usertemp/mubjb/new_libmesh/contrib' Building bin/amr /usertemp/mubjb/new_libmesh/lib/x86_64-unknown-linux-gnu_opt/libmesh.a(distributed_vector.x86_64-unknown-linux-gnu.opt.o) : In function `void Parallel::allgather<std::complex<double> >(std::vector<std::complex<double>, std::allocator<std::comp lex<double> > >&, bool)': distributed_vector.C:(.text._ZN8Parallel9allgatherISt7complexIdEEEvRSt6vectorIT_SaIS4_EEb[void Parallel::allgather<std::c omplex<double> >(std::vector<std::complex<double>, std::allocator<std::complex<double> > >&, bool)]+0x322): undefined ref erence to `int Parallel::datatype<std::complex<double> >()' distributed_vector.C:(.text._ZN8Parallel9allgatherISt7complexIdEEEvRSt6vectorIT_SaIS4_EEb[void Parallel::allgather<std::c omplex<double> >(std::vector<std::complex<double>, std::allocator<std::complex<double> > >&, bool)]+0x346): undefined ref erence to `int Parallel::datatype<std::complex<double> >()' distributed_vector.C:(.text._ZN8Parallel9allgatherISt7complexIdEEEvRSt6vectorIT_SaIS4_EEb[void Parallel::allgather<std::c omplex<double> >(std::vector<std::complex<double>, std::allocator<std::complex<double> > >&, bool)]+0x49c): undefined ref erence to `int Parallel::datatype<std::complex<double> >()' distributed_vector.C:(.text._ZN8Parallel9allgatherISt7complexIdEEEvRSt6vectorIT_SaIS4_EEb[void Parallel::allgather<std::c omplex<double> >(std::vector<std::complex<double>, std::allocator<std::complex<double> > >&, bool)]+0x4c3): undefined ref erence to `int Parallel::datatype<std::complex<double> >()' collect2: ld gab 1 als Ende-Status zurück make: *** [bin/amr] Fehler 1 Maybe you can give me a hint where to start the troubleshooting. Thanks; Jan |
From: Roy S. <roy...@ic...> - 2009-09-17 17:52:17
|
On Thu, 17 Sep 2009, Jan Biermann wrote: > I have been using an old libmesh version until now because I didn't want to > bother changing all my application code. But now the problem size becomes > this big that I cannot get around the latest version (because I think you > changed the mesh class for parallel computing, so not storing the entire mesh > on every processor,right?) No permanent change to the mesh class yet - the ParallelMesh does what you're talking about, but it still has some bugs with adaptive coarsening, and no schedule for when they might be fixed. If you're not doing any adaptive coarsening then you can --enable-parmesh at configure time to use it. There is a permanent change in the SVN head to use "ghosted" instead of global PETSc vectors for many applications, which should save you a little RAM. > But now that I want to install the latest version, I ran into some bugs where > the code didn't want to compile for complex numbers but after fixing it, Thanks for letting us know! I haven't tried --enable-complex on the SVN head in a long while. I've added a libmesh_not_implemented() to the Ensight output if people try to write with LIBMESH_USE_COMPLEX_NUMBERS, and made a few changes to make sure it compiles. EnsightIO is a relatively new class from Ben Kirk; I'm not familiar with the format and I'm not sure when/if we'll be able to add actual complex-number support to it. I've changed fabs() to std::abs() in Derek's new NumericVector::abs() functions, and v*v to libmesh_norm(v) in NumericVector::subset_l2_norm(). Trilinos seems pretty incompatible with complex numbers unless there's an EpetraVector option I'm missing, so I've changed our configure.in to disable Trilinos support whenever complex-number support is requested. > I finally get an error that I don't understand: We use Parallel::datatype<> to wrap a bunch of MPI calls into a much nicer interface. Unfortunately, it looks like MPI_DOUBLE_COMPLEX is a Fortran-only datatype, and MPI::DOUBLE_COMPLEX requires a C++ MPI2 standard whereas we've been trying to stay C MPI1 compatible, so we have to write special template definition for Parallel operations on std::complex data. If allgather code ever tried to use a non-special template definition, this is the error you'd see. Ben added a new Parallel::allgather() argument a while back, but he did it properly on both the regular and complex versions - I'm not sure what the problem here could be. I'll keep looking. --- Roy |
From: Roy S. <roy...@ic...> - 2009-09-17 19:09:53
|
On Thu, 17 Sep 2009, Roy Stogner wrote: > We use Parallel::datatype<> to wrap a bunch of MPI calls into a much > nicer interface. Unfortunately, it looks like MPI_DOUBLE_COMPLEX is a > Fortran-only datatype, and MPI::DOUBLE_COMPLEX requires a C++ MPI2 > standard whereas we've been trying to stay C MPI1 compatible, so we > have to write special template definition for Parallel operations > on std::complex data. If allgather code ever tried to use a > non-special template definition, this is the error you'd see. > > Ben added a new Parallel::allgather() argument a while back, but he > did it properly on both the regular and complex versions - I'm not > sure what the problem here could be. I'll keep looking. Okay, I've found the problem. Ben cleaned up the API a bit, so that Parallel::send and Parallel::receive (and nonblocking_send, etc.) all take an optional DataType argument; if a wrapper function is called where that argument isn't present it is assumed to be datatype<T> for the data of type T being sent. But datatype<std::complex<T> > doesn't exist, and somehow during those API changes we neglected to notice the new discrepancy between the regular and complex send/receive call versions. Fix A would be to add the DataType argument to the complex call implementations, and add a "phony" datatype<std::complex<T> > that gets ignored by those implementations. The trouble with this is that I *like* having datatype<std::complex<T> > be undefined, because then if someone really does try to use it, the error gets flagged at compile time instead of run time. And, as in this case, compile time errors are usually much easier to reproduce and fix. Fix B would be to create complex send/receive call versions for the wrapper functions. This one requires more code, but it's all simple code. I'll start on that now, unless Ben chimes in to disagree. --- Roy |
From: Roy S. <roy...@ic...> - 2009-09-17 20:07:17
|
On Thu, 17 Sep 2009, Roy Stogner wrote: > On Thu, 17 Sep 2009, Roy Stogner wrote: > >> We use Parallel::datatype<> to wrap a bunch of MPI calls into a much >> nicer interface. Unfortunately, it looks like MPI_DOUBLE_COMPLEX is a >> Fortran-only datatype, and MPI::DOUBLE_COMPLEX requires a C++ MPI2 >> standard whereas we've been trying to stay C MPI1 compatible, so we >> have to write special template definition for Parallel operations >> on std::complex data. If allgather code ever tried to use a >> non-special template definition, this is the error you'd see. >> >> Ben added a new Parallel::allgather() argument a while back, but he >> did it properly on both the regular and complex versions - I'm not >> sure what the problem here could be. I'll keep looking. > > Okay, I've found the problem. One of the two problems, anyway. Problem two: for some reason the DistributedVector call to Parallel::allgather(std::vector<T>&) with T=std::complex<double> isn't getting resolved to the Parallel::allgather(std::vector<std::complex<T> >&) specialization. I'm not sure how to fix this one... --- Roy |
From: Roy S. <roy...@ic...> - 2009-09-17 20:51:46
|
On Thu, 17 Sep 2009, Roy Stogner wrote: > One of the two problems, anyway. Problem two: for some reason the > DistributedVector call to Parallel::allgather(std::vector<T>&) with > T=std::complex<double> isn't getting resolved to the > Parallel::allgather(std::vector<std::complex<T> >&) specialization. > I'm not sure how to fix this one... You can't partially specialize a function template. You can overload a function with a different function template that looks exactly *like* a partial specialization, but that still doesn't make it a partial specialization. For instance, if your general function template takes a default argument, then you'd better declare the specialized template to take the same default argument, not just assume that it's going to be affected by the first declaration. C++ is perhaps not the most intuitive language in the world. --- Roy |
From: John P. <pet...@cf...> - 2009-09-17 20:54:19
|
On Thu, Sep 17, 2009 at 3:51 PM, Roy Stogner <roy...@ic...> wrote: > > > On Thu, 17 Sep 2009, Roy Stogner wrote: > >> One of the two problems, anyway. Problem two: for some reason the >> DistributedVector call to Parallel::allgather(std::vector<T>&) with >> T=std::complex<double> isn't getting resolved to the >> Parallel::allgather(std::vector<std::complex<T> >&) specialization. >> I'm not sure how to fix this one... > > You can't partially specialize a function template. You can overload > a function with a different function template that looks exactly > *like* a partial specialization, but that still doesn't make it a > partial specialization. For instance, if your general function > template takes a default argument, then you'd better declare the > specialized template to take the same default argument, not just > assume that it's going to be affected by the first declaration. > > C++ is perhaps not the most intuitive language in the world. Does that mean you got libmesh compiling with complex enabled? Nice work! -- John |
From: Roy S. <roy...@ic...> - 2009-09-17 21:03:57
|
On Thu, 17 Sep 2009, John Peterson wrote: > On Thu, Sep 17, 2009 at 3:51 PM, Roy Stogner <roy...@ic...> wrote: >> >> >> On Thu, 17 Sep 2009, Roy Stogner wrote: >> >>> One of the two problems, anyway. Problem two: for some reason the >>> DistributedVector call to Parallel::allgather(std::vector<T>&) with >>> T=std::complex<double> isn't getting resolved to the >>> Parallel::allgather(std::vector<std::complex<T> >&) specialization. >>> I'm not sure how to fix this one... >> >> You can't partially specialize a function template. You can overload >> a function with a different function template that looks exactly >> *like* a partial specialization, but that still doesn't make it a >> partial specialization. For instance, if your general function >> template takes a default argument, then you'd better declare the >> specialized template to take the same default argument, not just >> assume that it's going to be affected by the first declaration. >> >> C++ is perhaps not the most intuitive language in the world. > > Does that mean you got libmesh compiling with complex enabled? Nice work! Still working on it. We've basically got the same problem in every Parallel:: method that takes a default argument. There's a comment in there where someone (me? Ben?) fixed one of them by adding a second declaration (and a comment thinking that it was gcc complaining) but I need to go through this whole file and make sure we're not missing anything. I'll be done before I head home for tonight. Part of me thinks we ought to just instantiate a user-defined MPI_Datatype for std::complex<float/double/long double>. Keep it in a global, put it right after MPI_Init. Then we can get rid of all these std::complex<T> operator overloadings. But I'll do things the less risky, more tedious way for now. --- Roy |
From: Roy S. <roy...@ic...> - 2009-09-17 21:06:46
|
On Thu, 17 Sep 2009, Roy Stogner wrote: > Then we can get rid of all these std::complex<T> operator > overloadings. Get rid of all these function overloadings, I mean. It may have taken me too long to work out the difference between overloading and template specialization, but I do know what operator overloading is, I swear... --- Roy |
From: Roy S. <roy...@ic...> - 2009-09-17 23:30:34
|
On Thu, 17 Sep 2009, Roy Stogner wrote: > Still working on it. We've basically got the same problem in every > Parallel:: method that takes a default argument. There's a comment in > there where someone (me? Ben?) fixed one of them by adding a second > declaration (and a comment thinking that it was gcc complaining) but I > need to go through this whole file and make sure we're not missing > anything. I'll be done before I head home for tonight. Done, and with a parallel coarsening bugfix thrown in for good measure. I haven't tested it with --enable-complex yet (don't have a PETSc build around with PetscScalar==std::complex<double>) but it compiles that way again, at least. --- Roy |
From: Derek G. <fri...@gm...> - 2009-09-17 23:32:49
|
On Sep 17, 2009, at 5:30 PM, Roy Stogner wrote: > Done, and with a parallel coarsening bugfix thrown in for good > measure. I haven't tested it with --enable-complex yet (don't have a > PETSc build around with PetscScalar==std::complex<double>) but it > compiles that way again, at least. PARALLEL COARSENING FOR PARALLEL MESH???!???!!!! I thought it would never happen! ;-) j/k... thanks for working on all of this! Derek |
From: Roy S. <roy...@ic...> - 2009-09-17 23:42:13
|
On Thu, 17 Sep 2009, Derek Gaston wrote: > On Sep 17, 2009, at 5:30 PM, Roy Stogner wrote: >> Done, and with a parallel coarsening bugfix thrown in for good >> measure. I haven't tested it with --enable-complex yet (don't have a >> PETSc build around with PetscScalar==std::complex<double>) but it >> compiles that way again, at least. > > > PARALLEL COARSENING FOR PARALLEL MESH???!???!!!! > > I thought it would never happen! ;-) Ah, damn, now I feel awful... No, this was a parallel coarsening bug with SerialMesh - I fixed a subtle but serious (could corrupt your data, given just the right combination of adaptivity and partitioning) send_list bug a little while ago, and inadvertently introduced a gross but relatively minor (slowed your program down a little and might crash it with ghosted vectors enabled) bug. That latter bug is what's now fixed. A fixed ParallelMesh is still on my TODO list, but I won't be looking at it again for a month at least. --- Roy |
From: Roy S. <roy...@ic...> - 2009-09-21 14:32:40
|
On Mon, 21 Sep 2009, Jan Biermann wrote: > Thanks for getting that fixed so quick! No problem - but if you could do a bit of 0.6.3->SVN head regression testing, I'd appreciate it. I got everything compiling again, sure, but I don't currently have a complex-valued Petsc build available to test everything with, nor any real complex-dependent applications to use. The compile-time problems shouldn't be repeated, at least. I'll be sure to add a --enable-complex test run when we expand our BuildBot configuration next month. That way the next time someone breaks complex arithmetic support in the svn head we'll catch it and fix it right away, not just when we prepare for a release or when a user of it svn updates. --- Roy > Roy Stogner schrieb: >> >> On Thu, 17 Sep 2009, Jan Biermann wrote: >> >>> I have been using an old libmesh version until now because I didn't want >>> to bother changing all my application code. But now the problem size >>> becomes this big that I cannot get around the latest version (because I >>> think you changed the mesh class for parallel computing, so not storing >>> the entire mesh on every processor,right?) >> >> No permanent change to the mesh class yet - the ParallelMesh does what >> you're talking about, but it still has some bugs with adaptive >> coarsening, and no schedule for when they might be fixed. If you're >> not doing any adaptive coarsening then you can --enable-parmesh at >> configure time to use it. >> >> There is a permanent change in the SVN head to use "ghosted" instead >> of global PETSc vectors for many applications, which should save you a >> little RAM. >> >>> But now that I want to install the latest version, I ran into some bugs >>> where the code didn't want to compile for complex numbers but after fixing >>> it, >> >> Thanks for letting us know! I haven't tried --enable-complex on the >> SVN head in a long while. >> >> I've added a libmesh_not_implemented() to the Ensight output if people >> try to write with LIBMESH_USE_COMPLEX_NUMBERS, and made a few changes >> to make sure it compiles. EnsightIO is a relatively new class from >> Ben Kirk; I'm not familiar with the format and I'm not sure when/if >> we'll be able to add actual complex-number support to it. >> >> I've changed fabs() to std::abs() in Derek's new NumericVector::abs() >> functions, and v*v to libmesh_norm(v) in >> NumericVector::subset_l2_norm(). >> >> Trilinos seems pretty incompatible with complex numbers unless there's >> an EpetraVector option I'm missing, so I've changed our configure.in >> to disable Trilinos support whenever complex-number support is >> requested. >> >>> I finally get an error that I don't understand: >> >> We use Parallel::datatype<> to wrap a bunch of MPI calls into a much >> nicer interface. Unfortunately, it looks like MPI_DOUBLE_COMPLEX is a >> Fortran-only datatype, and MPI::DOUBLE_COMPLEX requires a C++ MPI2 >> standard whereas we've been trying to stay C MPI1 compatible, so we >> have to write special template definition for Parallel operations >> on std::complex data. If allgather code ever tried to use a >> non-special template definition, this is the error you'd see. >> >> Ben added a new Parallel::allgather() argument a while back, but he >> did it properly on both the regular and complex versions - I'm not >> sure what the problem here could be. I'll keep looking. >> --- >> Roy > |