From: Derek G. <fri...@gm...> - 2018-05-02 20:49:37
|
I'll add one more reason to hate std::vector<bool>: threading. std::vector<bool> is not at _all_ thread safe: multiple processors working on the same vector will clobber each other's values. This leads to hair pulling out race conditions that take forever to track down. I personally wish they had made std::vector<bool> take 8 times more memory... and then let you use something else if you needed the optimization... but that's just me. Derek On Wed, May 2, 2018 at 2:24 PM Roy Stogner <roy...@ic...> wrote: > > I hope you don't mind, but I'm going to Cc: this to libmesh-devel. > These are good questions, and my opinions do not match those of all > the other developers and it might be unfair not to let them chime in. > > On Wed, 2 May 2018, Tuzun, Robert wrote: > > > Greetings. I'm still in the middle of my hiatus, but there was one > > question I've been meaning to ask. I notice, in several places, the > > use of vector<bool> in the code. Way back in 2001, Scott Meyers > > (Effective STL, Item 18) recommended against the use of vector<bool> > > and recommended using deque<bool> or bitset instead. > > "As an STL container, there are really only two things wrong with > vector<bool>. First, it's not an STL container. Second, it doesn't > hold bools. Other than that, there's not much to object to." - Scott > Meyers. I seriously love his writing. It takes something special to > make programming language minutia so entertaining. > > This is indeed often annoying, because the inability to take real > references to vector entries or to get a pointer to a underlying array > means we often can't handle vector<bool> with the same generic code as > we use to handle vector<literally_anything_else>. I assume you first > encountered vector<bool> in our parallel_implementation.h overloads, > written to handle just that problem? > > However, the recommended alternatives are even worse in many cases. > If a code is always fine using deque<bool>, if using EIGHT TIMES as > much memory never fazes the author, then Meyers' hundreds of C++ > recommendations should be ignored in favor of "Item 1: Use Java > Instead" (or maybe Python or Go these days?). Conversely, if a code > is always happy with bitset, and never needs an array that can be > resized, then what are you writing, autotranslated Fortran 77? > > Personally, I'm mostly with Howard Hinnant: > https://isocpp.org/blog/2012/11/on-vectorbool > > What's abominable about vector<bool> is the name, because that's > basically a lie about supporting the whole std::vector interface, but > the data structure itself is worthwhile and often useful. And I'm > even sympathetic to the name! Once you create a few well-chosen > overloads (like the parallel_implementation.h stuff) then the rest of > your generic code can *mostly* support vector<bool> as if it were a > real std::vector. And when you miss a needed specialization, the > resulting breakage is a compile-time failure, so it's a hassle rather > than a disaster. > > > > I don't know if this advice still holds, and my search on the Web > > yields conflicting answers (just do a Google search on vector<bool> > > and you'll see what I mean). Is there reason for concern? > > Well, despite everything I said above: yes, there is. > > > First, to Meyers' objections I'd also have added "Third, it requires > bit-fiddling for every access", which means if you have a vector small > enough to *not* worry about memory savings, maybe you ought to be > using vector<unsigned char> instead. Grepping through the code for > short logically-boolean vectors, I see that we're doing the > unsigned-char-really-holding-bool trick in dof_map.C, and > mesh_communication_global_indices.C... but we're using small > vector<bool> in about a dozen places? > > I'm not going to change those without benchmarking first, because > "unsigned char is faster due to less bit fiddling" vs "bool is faster > due to fitting more stuff in cache" is the sort of system-dependent > question I can't answer a priori, and in the absence of data I think > vector<bool> is more self-documenting for something that's logically > boolean. If anyone wants to try to prove me wrong in the libMesh > context, I'd say try System::boundary_project_solution for the > benchmark. That has a few vector<bool> inside loops over elements, > so it looks like the easiest place to expose performance differences. > > > Second and more seriously, in libMesh whenever find ourselves in a > situation where tactically vector<bool> is definitely the right > choice, it may mean that strategically we've made a huge mistake! > > Our classic use case for vector<bool> are vectors indexed over nodes > or over elements, where we need to support millions or billions of > entries and so using 8 times as much space is clearly a bad idea. > > *However*, some of that code was a reasonable design only because it > dates back to ReplicatedMesh-only days, where we couldn't get away > from having data stored for every element on every processor. These > days, a DistributedMesh user should be able to avoid ever requiring > O(N_elements) memory on a single MPI rank, which means that > applications should never allocate such a long vector! So the library > should never be using one, at least on a non-debug-mode > non-ReplicatedMesh-only code path. > > So let's see... > > Some of these uses are OK. E.g. the MeshTools assertion code is all > debug-mode-only, where readability is much more important than > efficiency. > > Some are more dubious. E.g. UnstructuredMesh::all_first_order() only > gets used in practice for writing serialized viz output of an error > estimate, so making it perfectly scalable would require improving > multiple other parts of that code path first, meaning it's suboptimal > but not high priority. > > Some of these uses are probably outweighed by other distributed mesh > incompatibility issues. At first glance I'd guess > MeshTools::Modification::Smooth may not even work on a distributed > mesh, but we've just never tried to use it that way. Looks like > LaplaceMeshSmoother is distributed-mesh-aware, so maybe everyone who > needs smoothing post-distribution has just been using that? I don't > think we have test coverage of it though so I wouldn't trust it. > > But a couple of these uses are just way past due to be fixed! We're > still relying on big serialized vectors (not just of bool; ErrorVector > is a straight-up vector<float> under the hood!) in our error > estimation and our mesh refinement flagging code. > > > > A separate question: in looking at parallel_implementation.h, I note > > several references to Communicator::get() (e.g., line 298). Can you > > tell me where this is defined? I could not find it anywhere. For > > example, the command grep Communicator::get -r * from the main > > directory did not yield the answer. > > And it probably doesn't help that our online Doxygen isn't generated > from master/HEAD! > > > http://libmesh.github.io/doxygen/classlibMesh_1_1Parallel_1_1Communicator.html#a4b6bd36c38eb7de350fcd56510306ced > > is still useful, though, even though refactoring has moved that method > to include/parallel/communicator.h, it's still the same shim accessor > method. We often leave any definition that short in the class > declaration itself, where it doesn't need the ClassName:: prefix. > --- > Roy > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Libmesh-devel mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-devel > |