Create new OWL class to encapsulate Windows code page functions
Borland's Object Windows Library for the modern age
Brought to you by:
jogybl,
sebas_ledesma
Nice! I will do a review when time allows.
Last edit: Vidar Hasfjord 2014-12-19
Hi Joe,
Although, like you say, code pages should be obsoleted by Unicode, this functionality may be required for legacy code, and since it is core Windows functionality, encapsulating it is clearly within the remit of OWLNext.
I generally like your approach, especially the support for standard iteration. I recognise the Singleton pattern here, and I would ideally like it expressed in the conventional manner. This would allow you to replace the static file-local CodePageList with private state in the singleton.
Instead of the many indexing member functions, I would prefer to have these functions as members of TCodePage instead. E.g. I want to do
codepages[i].GetName()instead ofcodepages.GetName(i).Kudos for linking to MSDN in the documentation.
Code review:
usingstatements totypedef. Rationale: Modern style.using TCodePageId = uint.intfor indexes and sizes. Rationale: Refer to signed vs unsigned expert advice. Consensus seems to be to use signed types for quantities.GetCounttosize,RemovetoclearandGetCodePageObjecttoat? Rationale: Follow standard container conventions consistently.Last edit: Vidar Hasfjord 2014-12-19
Here is a self-contained version of BuildCodeList_ that eliminates the global variables. It can be used to initialise a private member of TCodePages, rewritten as a conventional singleton.
Hi Vidar,
I've simplifed the class. It is now completely a static class. A user is free to instantiate if they wish, but it does nothing other than to perhaps provide a shorter name to reference the member functions:
A major part of the simplification was getting rid of all that. I now have a member function that returns a const reference to the container. The user can use any of the std::vector routines they wish that are allowed for a const. This eliminates the need for replicating all of those functions.
Okay, its nice to have a proper example. Well, mostly anyway, as it doesn't follow the "rules" completely.
Done.
Done, even though "transfer.h" doesn't always do that.
Done.
Done.
Not done. It would seem that the designers of the standard (e.g., vector and string) agree with me. Indexes and sizes are unsigned quantities; signed values make no sense. Since I've gotten rid of most of that, it no longer matters anyway.
Done, even though "transfer.h" doesn't always do that (e.g., namespace owl {).
Done.
No longer applicable.
Done.
Done (if I understood correctly).
Done (my mistake; I knew that).
Done.
I do not know of any that will for a match return the index of the container. That is crucial. Not done. While I'm not posting the entire updated source yet until you respond, here is the updated Doxygen example:
No longer applicable.
If this source of information is credible, neither one necessarily does:
shrink_to_fit
resize
So I have included them both, in hopes that a given implementation will deallocate for at least one of them.
Done.
While I have implemented most of that, I have not gotten rid of the global variables. Your method causes the code page list to be cleared and recreated every time it is called, which defeated the purpose of my design. It needs to be created only once; otherwise it is memory and performance inefficient. Having the code page list local to that function makes it inaccessible to the other functions that need it. And it would appear that perhaps you intended for the calling functions to take the return value and make a persistent copy of it which would also be memory and performance inefficient. So CodePageList and builtList remain global.
Using builtList instead of just checking CodePageList.empty() was an attempt to thwart off multi-threading issues (albeit an imperfect one), as well as the possibility that Windows simply could (in theory) not return any supported code pages in which it would just keep trying repeatedly.
Here is an example from which I adapted, which claims to be multi-thread-safe, but it was written in MFC and I do not know the equivalent non-MFC to implement:
http://www.codeproject.com/Articles/16638/Enumerating-System-Code-Pages
Perhaps you could enlighten me, and I will implement it.
Last edit: Joe Slater 2014-12-20
Here is a PDF document that might be of interest with respect to multi-threading: C++ and the Perils of Double-Checked Locking.
Hi Joe,
The following is just advice and my opinion. This class does not affect the rest of OWLNext, so I'll leave you in charge of it's design and implementation. Put it together the way you want it! :-)
In other words, you still do not use the conventional Singleton pattern. Any particular reason for that? If not, it would be good to have recognisability and consistency (we use Singleton in a few other places in OWLNext).
For your convenience, here is an outline of the Meyers Singleton:
The private constructor makes the class non-instantiable by clients. CodePageList is now a member (not a global), and the Meyers trick, using a static local variable, ensures that it is created only once, and on demand, i.e. it is not created if not used. The private constructor would initialise CodePageList using something like my suggested factory function BuildCodeList_. Hence no need for your boolean
builtListglobal flag. See rule 10 and 21 of "C++ Coding Standards" (Sutter/Alexandrescu).The linked pages in my previous post has further links to more information about Singletons.
Right, "transfer.h" may itself need some improvements. In this case, I only refer to it as an example of the file description block, though.
I've spent a fair amount of time on eliminating issues due to signed/unsigned mismatches in OWLNext already, most caused by the ill-advised decision by those designers. :-)
Edit: I am unfair on the designers here. They had to work with design constraints, in particular portability to 8-bit and 16-bit platforms where an extra bit really counts. See Alf's answer to the question "Why is size_t unsigned?" at StackOverflow.
Note that Bjarne Stroustrup himself warns about using unsigned integers in "The C++ Programming Language" (section 4.4) and many coding standards (including Google's C++ Style Guide) follow his advice. Here is an article that argues the case against unsigned quantities:
http://www.soundsoftware.ac.uk/c-pitfall-unsigned
In any case, we should try to be consistent. This is difficult in OWLNext since the Windows API uses signed quantities and the C++ containers do not. But, the current practised coding standard in OWLNext is to use signed quantities. So, casts are generally toward signed (e.g.
static_cast<int>(c.size())). I've now included this as a rule in our Coding Standards.You can use
find_ifanddistance:Aside: By the way, this is a nice example of why indexes should be signed. Here
distancenaturally returns a signed value (iterator_traits::difference_type).Any sane implementation of
shrink_to_fitwill deallocate (at least some) excessive capacity. Note thatresizesets the size, which is different. A vector has size and capacity. The latter can be larger than the former.shrink_to_fitwill never change the size (i.e. remove elements). So you need to remove the elements explicitly. But useclear()instead ofresize(0). It is clearer (pun intended).Please do. :-)
Note that in my suggested Singleton solution, BuildCodeList_ is called at most once. You don't need anything extra (i.e. your globals) to arrange for that. It is handled by the Meyers trick.
By using thread-local storage (now standard) for the static
codepagesin my BuildCodeList_, the function becomes fully thread-safe.However, this is not needed. Since this function is only called during the initialisation of a local static (the Meyers Singleton), which is guaranteed to be properly synchronised by the C++ standard, a race condition can never occur. (Note: Initialisation of local statics, such as used by the Meyers Singleton, wasn't thread-safe until C++11. So I assume a compliant compiler in my argument here.)
I've read this aspirin-promoting article before, thank you very much. :-)
For a C++11 perspective on this issue, see:
http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11
PS. If you are interested in multi-threading, see my article "Multi-threading and OWLNext" which I wrote during preparations for 6.32, in which I performed a light review of the OWLNext code in this regard.
Related
Wiki: Coding_Standards
Wiki: Multi-threading_and_OWLNext
Last edit: Vidar Hasfjord 2014-12-20
Hi Vidar,
I appreciate that, but I do wish to comply with the OWLNext standards in place. And again I appreciate your patience with me as I am in new information overload (e.g., multi-threading, multi-byte, Unicode, standard containers and strings, OWLNext, OWLNext Coding Standards, C++11, TurtoiseSVN, SourceForge, …). I will catch up to the 21st century! :-)
I was. I had clear, resize, and shrink_to_fit — in that order. But this is now a mute point (more to come).
Regarding signed vs. unsigned…
Note that there are more up votes for my opinion on this matter. :-)
It is ironic that you'd reference an audio/music source, as I develop such as well. And thanks; I needed a good laugh! :-)
Most likely likewise for signed values: (MIN_INT - 1) tends to give you MAX_INT. But even worse, since doing so is undefined in C, there's no telling what you'll get.
An empty statement.
You don't have to be stupid and inexperienced to make a mistake. Here, the author correctly identifies the source of the logic problem, but procedes to blame it on unsigned and compilers. Put the blame where it belongs.
Spoken like a true politician. For a 32-bit int, only one bit means 2 billion valid values I'm throwing away needlessly if I buy into this argument.
Consider TPoint and TSize… computing TSize from a TPoint(MIN_INT, MIN_INT) and TPoint(MAX_INT, MAX_INT) will no doubt given me invalid values since TSize uses signed values (not OWLNext's fault; Microsoft's).
As a mathematician as well, I believe in proper representation of math principles as best as possible. Call a spade a spade.
Enough of the soap box; back to relevant discussion.
Thanks. That was quite helpful. I have implemented this as-is, with exceptions…
I moved GetInstance to private; I saw no reason it should be made publicly available.
I still have CodePageList implemented in an anonymous namespace, rather than a data member. There are a couple of reasons for this…
One, I was led to believe in my recent OWLNext activity that functions and variables should not be in the class definition unless really needed. And in this case, it isn't.
Two, the lambda Windows callback functions couldn't readily get to it.
For this reason, I have changed GetIndex to return an int. It's quite unlikely there will ever be 2G+ code pages, and this is antiquated technology as well.
Thanks; that was useful and informative. But as a self-proclaimed efficiency expert, I dug around and chose to implement std::equal_range since it performs a binary search and the list is sorted.
I have, because it makes little sense now using the Meyers Singleton method. However, it originally had an additional purpose such that clear (f.k.a., Remove) could be used, and the list would get reloaded as needed. Now that can't be done and be thread-safe. So I have also removed the clear function; I had no need for it, but my efficiency pedanticism wanted me to make it available. Now I fear it is too dangerous to implement, as an application that used it could not thread-safely have the list re-created.
Due to the numerous changes from the original, I have attached the latest files.
Last edit: Joe Slater 2014-12-21
Hi Joe,
Thanks for contributing to the modernising of OWLNext. This is a direction I have championed, and it is good to have support.
With regard to learning modern C++, I recommend one of Stroustrup's latest books, e.g. his short introduction "A tour of C++", which is well regarded by experts, or his teaching text book "Programming - Principles and Practice using C++", which takes the modern approach to teaching C++. For coding standards, I recommend "C++ Coding Standards" (Sutter/Alexandrescu). For detailed knowledge about C++ and pitfalls, see Meyers' and Sutter's books ("Efficient C++" and "Exceptional C++", respectively). For C++ meta-programming, I recommend "Modern C++ Design" (Alexandrescu) and "C++ Template Metaprogramming" (Abrahams/Gurtovoy). And, get the seminal text on Design Patterns (Gamma/Helms/Johnson/Vlissides), if it is not already on your book shelf.
Reading that, at first I thought you found the article well-written and humorous, like I did, but then, as you proceeded to attack its arguments, I understood it was a dismissive comment, and my heart sank. :-)
I think the author argues a very good point, which I have seen argued elsewhere as well, and which is one that convinced me to change (I used to follow the standard library and use
size_teverywhere): We do not think in terms of modulo arithmetic when we write expressions, so we do easily make mistakes when unsigned quantities are involved. Another take on this by some other author went something like this: "With modulo arithmetic there is a huge discontinuity at 0, and it is too close for comfort".Since we are developing a library for Windows Desktop applications, and we have 64-bit integer types readily available, I do not buy range arguments for
unsigned, as far as OWLNext is concerned.Regardless, thanks for falling into line with my stated coding standard for OWLNext.
This is highly unconventional and odd for a Singleton. The point of a Singleton is that it is an ordinary class in every way except that its construction is restricted to allow only one instance (or none) to exist throughout the program. Hence, the constructor is made private and access is only provided through GetInstance.
I guess the reason for your design is that your eyes were clouded by your already existing static member functions that access static state. With a conventional Singleton, these would be ordinary members accessing the Singleton's private state.
Another way of looking at it is that you should be able to convert a Singleton into an ordinary class, just by replacing GetInstance by a public constructor.
Touche. :-)
But note the rationale for that advice: Implementation hiding. There are better ways to achieve implementation hiding than disfiguring a Singleton into the unrecognisable. Use the Pimpl idiom if you must. However, you already expose your internal container in your class interface, so it wouldn't hide much in this case.
So my bottom line recommendation is to make your member functions non-static and follow the conventional outline of the Meyers Singleton. It will be instantly recognisable by any programmer with design patterns experience, hence more maintainable.
See my BuildCodeList_ for how to solve that using a function-local static. Then rewrite your constructor using member initialisation like so:
If you want to fully optimize (and I guess you do :-), you can use std::move to ensure there is no copying of the list involved:
Alternatively, if you are more inclined to think in C, use a static pointer:
See "Return value optimization" on why this doesn't involve copying in modern compilers.
Since you are not really interested in the upper bound of the match, you can simply use std::lower_bound. However, note that, on small lists, a binary search may not actually be faster than a linear search, due to the overhead of the more complex logic, branching and memory access pattern. So this could be a premature optimisation (ref. item 8 in "C++ Coding Standards").
I notice you have put inlined functions in the class definition, rather than out-of-class in the header (which is the convention used by the OWL designers). I actually like your style better. The counter-argument is that it makes the class definition less readable, but this is less relevant today with outlining features in editors (just collapse all member definitions to get a better overview), and with code browsers and Doxygen documentation, we have better ways to get the overview of a class anyway.
Aside: I am a little torn about were we should put our Doxygen documentation. Now it goes with definitions, which means both header and implementation files are affected. There is an argument for putting it all in the implementation files to reduce build dependencies and decrease header size (faster build?). But separating class documentation from the definition means more hassle (in particular, we would need to use the tag
\class <name>, which may get out of sync). I welcome your view on this.Nitpicks:
inlinespecifier within a class definition is redundant.auto SetAnsiCodePageId(/*...*/) -> bool {return OWL_SET_ACP_(idCodePage);}. For Windows CE, this macro will call ::SetACP. Otherwise, it will expand tofalse. However, try to keep such macros to the implementation files, as not to pollute the client's global namespace.Last edit: Vidar Hasfjord 2014-12-22
You are welcome to continue work on this on the trunk.
Last edit: Vidar Hasfjord 2014-12-22
Hi Vidar,
I got sidelined with TListViewCtrl.
I'm okay with that. This is not about creating a Meyers Singleton. This is about creating the code page list thread-safely. If it helps, call it the new Slater-Meyers convention. :-)
I'm not okay with that. All but two of the functions, do not need the code page list. And I do not wish to punish users with creating the code page list when not needed for those functions.
Not clouded, but rather quite clear. These are static functions.
No, I can't do that. That would result in putting the registers and flags on the stack, jumping to a new address emptying the processor instruction cache, only to do the simple operation to do, restoring the registers and flags on the stack, and jumping to the caller address emptying the processor instruction cache. That's just too much of a performance penalty, even with the fast processors we have today. MS perhaps recognized this, as Vista to Windows 7 was mostly just optimizing code.
Yep, I missed that one; thanks. But don't you think upper_bound would be better? :-)
I think it makes it clearer, but also more efficient in that the header file is smaller.
My feelings are this… For class definitions, inline member functions, and class definitions within a class (e.g., enums), they go in the header. But for non-inline member functions defined in a .cpp file, they go there simply for reducing the size of the header file. I'm not as concerned about compiler efficiency as I am ultimate application efficiency. I do believe compilers can handle comment lines rather quickly (if not, they need optimizing!).
Ideally, there should be no reason why users need to examine header or source files. I find myself still in the habit of doing that, because the original OWL not only had errant information in the help files, but missing information (i.e., undocumented features). Unfortunately, OWLNext is still lacking documented functions. Be rest-assured, that any contributions I make, will not be amiss (intentionally) of Doxygen comments.
Well, there is the case of the unlikely One Definition Rule. But I have conceded and removed the inline keyword.
Of course; I knew that. It now does, as well as the local CodePageList.
Done, with ANSI and OEM.
Done.
Done.
Since it applies to the header file with inline functions, I un-define them at the end of the header file. Leaving it here as an inline function allows developers to not have to rebuild a special CE version of the library (unless OWLNext has other CE-specific non-inlines). Only the app that gets generated may have special builds.
Done [r2966]. With the restriction of 8-character filenames having been lifted, I have appropriately pluralized the filenames — with apologies to the Norwegians. :-)
I believe I'm done with this, and changing the status.
Related
Commit: [r2966]
Last edit: Joe Slater 2014-12-24