From: Rahul S. <rah...@gm...> - 2009-12-16 02:38:42
|
Hi: I just wanted to bring this to your attention: In the following example code: Mesh * mesh = new Mesh(3); delete mesh; The destructor of mesh is called, which through a bunch of nested calls to parent destructors ends up calling BoundaryInfo::clear(). This calls _boundary_node_id.clear(). In the above example, _boundary_node_id happens to be empty. According to GCC 4.3.2, clearing an empty container is not safe: The following is taken from the Multimap.h file in the Std C++ library (GCC 4.3.2) void 00190 erase(iterator __first, iterator __last) 00191 { 00192 // _GLIBCXX_RESOLVE_LIB_DEFECTS 00193 // 151. can't currently clear() empty container 00194 __glibcxx_check_erase_range(__first, __last); 00195 while (__first != __last) 00196 this->erase(__first++); 00197 } 00198 void 00207 clear() 00208 { this->erase(begin(), end()); } Regards, Rahul |
From: Roy S. <roy...@ic...> - 2009-12-16 04:09:34
|
On Tue, 15 Dec 2009, Rahul Sampath wrote: > According to GCC 4.3.2, clearing an empty container is not safe: Interesting. Apparently someone noticed this flaw in the C++ standard in 1999, but it wasn't officially fixed in the standard until 2003: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#151 Looks like in the C++98 standard, clear() was still defined as erase(begin(), end()), but the standard for erase(q1,q2) said that q1 had to be a dereferenceable iterator. The resolution (to merely require that [q1, q2) be a valid, which includes empty, range) is in "Technical Corrigendum 1", which came out in 2003. gcc started adding support for c++0x draft features in the 4.3 series; I'm surpised they hadn't finished TC1 support yet! If 4.3.2 is actually segfaulting then we definitely want to fix it. But I hate adding an extra if(!empty()) test for people whose compilers are up to date. I'll wrap everything in an inline libmesh_clear() function, to make re-optimizing easier later; eventually it would be nice to detect compiler versions that support empty.clear() and automatically get rid of the redundant if(). Thanks for catching this! --- Roy |
From: Roy S. <roy...@ic...> - 2009-12-16 17:34:07
|
On Tue, 15 Dec 2009, Rahul Sampath wrote: > In the following example code: > > Mesh * mesh = new Mesh(3); > delete mesh; > > The destructor of mesh is called, which through a bunch of nested > calls to parent destructors ends up calling BoundaryInfo::clear(). > This calls > _boundary_node_id.clear(). In the above example, _boundary_node_id > happens to be empty. According to GCC 4.3.2, clearing an empty > container is not safe: Could you give us a full stack trace of the segfault? I'm trying to replicate this bug myself, but the code above is working fine on my gcc 4.3.1 libMesh build. A coworker has been experimenting with deleting empty containers with gcc 4.3.2, and (even when defining the _GLIBCXX_DEBUG flag needed to get down to the multimap.h snippet you pasted) he hasn't be able to get a segfault out of it. > void > 00190 erase(iterator __first, iterator __last) > 00191 { > 00192 // _GLIBCXX_RESOLVE_LIB_DEFECTS > 00193 // 151. can't currently clear() empty container > 00194 __glibcxx_check_erase_range(__first, __last); > 00195 while (__first != __last) > 00196 this->erase(__first++); > 00197 } > 00198 > void > 00207 clear() > 00208 { this->erase(begin(), end()); } Looking at that snippet, in fact, I can't see how it would result in a segfault. If that range check fails, it should trigger a SIGABRT, not SIGSEGV. And if the range check doesn't fail, begin() and end() should be equal and so the while loop should be a do-nothing. Question for Rahul: is it possible that you've got a segfault elsewhere in your code, or improperly linked code? e.g. mixing libMesh versions, compilers, or compiler flag settings like _GLIBCXX_DEBUG between object files can sometimes result in spurious segfaults at runtime. And question for libMesh developers: if it turns out that empty_container.clear() is supported by old gcc versions, do we want to use a libmesh_clear() workaround anyway, just in case other compilers exhibit the problem? I'd say that's overkill, but on the other hand, we've still got our AutoPtr and OStringStream classes around to handle some non-C++98 compatible compilers; running into a non-2003-compatible STL wouldn't be any more unlikely. --- Roy |
From: Rahul S. <rah...@gm...> - 2009-12-16 18:21:56
|
Hi Roy: Yes. The problem apparently was not inside the clear as it should have thrown a SIGABRT. The problem was in the call _boundary_node_id.clear() because _boundary_node_id was cleared from the stack inside the constructor for Mesh. This was because Libmesh was compiled with the _GLIBCXX_DEBUG flag and my other libraries did not use this flag. If I use the flag _GLIBCXX_DEBUG to compile all the libraries, the segfault goes away. Regards, Rahul On Wed, Dec 16, 2009 at 12:33 PM, Roy Stogner <roy...@ic...> wrote: > > > On Tue, 15 Dec 2009, Rahul Sampath wrote: > >> In the following example code: >> >> Mesh * mesh = new Mesh(3); >> delete mesh; >> >> The destructor of mesh is called, which through a bunch of nested >> calls to parent destructors ends up calling BoundaryInfo::clear(). >> This calls >> _boundary_node_id.clear(). In the above example, _boundary_node_id >> happens to be empty. According to GCC 4.3.2, clearing an empty >> container is not safe: > > Could you give us a full stack trace of the segfault? I'm trying to > replicate this bug myself, but the code above is working fine on my > gcc 4.3.1 libMesh build. A coworker has been experimenting with > deleting empty containers with gcc 4.3.2, and (even when defining the > _GLIBCXX_DEBUG flag needed to get down to the multimap.h snippet you > pasted) he hasn't be able to get a segfault out of it. > >> void >> 00190 erase(iterator __first, iterator __last) >> 00191 { >> 00192 // _GLIBCXX_RESOLVE_LIB_DEFECTS >> 00193 // 151. can't currently clear() empty container >> 00194 __glibcxx_check_erase_range(__first, __last); >> 00195 while (__first != __last) >> 00196 this->erase(__first++); >> 00197 } >> 00198 >> void >> 00207 clear() >> 00208 { this->erase(begin(), end()); } > > Looking at that snippet, in fact, I can't see how it would result in a > segfault. If that range check fails, it should trigger a SIGABRT, not > SIGSEGV. And if the range check doesn't fail, begin() and end() > should be equal and so the while loop should be a do-nothing. > > Question for Rahul: is it possible that you've got a segfault > elsewhere in your code, or improperly linked code? e.g. mixing > libMesh versions, compilers, or compiler flag settings like > _GLIBCXX_DEBUG between object files can sometimes result in spurious > segfaults at runtime. > > And question for libMesh developers: if it turns out that > empty_container.clear() is supported by old gcc versions, do we want > to use a libmesh_clear() workaround anyway, just in case other > compilers exhibit the problem? I'd say that's overkill, but on the > other hand, we've still got our AutoPtr and OStringStream classes > around to handle some non-C++98 compatible compilers; running into a > non-2003-compatible STL wouldn't be any more unlikely. > --- > Roy > |