From: Tim M. <tma...@ne...> - 2012-10-19 18:56:03
|
I'd like to use JudySL for a unique map between zero-terminated strings and a 64-bit pointer. The value, however, is Word_t, defined in Judy.h as only an "unsigned long". (Specifically, I'm using C++, Visual Studio 2010.) Searching through the mailing list archive, I found this suggestion from John Skaller for how to modify Judy.h so that Word_t will be an "unsigned long long" when compiling for 64-bit systems. http://sourceforge.net/mailarchive/message.php?msg_id=27014844 I have two questions about using it, and one about the performance. 1.) Am I right in thinking that the necessary goodies are simply the lines inside the "#ifndef _WORD_T"? I was hoping it would be that simple. (Though it looks like the rest of John's modifications include tweaks for making it work with Visual Studio, which I would want to use, too.) 2.) Is that really all that's necessary? Are there any additional tweaks that would optimize JudySL for a long long Word_t Value? 3.) Has anyone done benchmarking on the effect of increasing Word_t to a long long? I imagine there would be hardly any speed impact. Thanks! --- Tim Margheim |
From: Alan S. <aj...@fr...> - 2012-10-19 20:11:36
|
hoping Doug chimes in with better advice, but my recollection is that you build libJudy two ways, for 32-bit or 64-bit. If you want 64-bit pointers (array values), then just use the 64-bit build. Don't try changing any defines... Shouldn't be necessary, and probably won't work anyway due to the very specific bit-handling inside the library. All done with "straight C" but correctness and efficiency is tied to some assumptions about sizes and packing. Now am I misunderstanding something? Alan Silverstein |
From: Tim M. <tma...@ne...> - 2012-10-19 20:30:13
|
It just clicked. I'm working on Win64 (LLP64 model), not UNIX (LP64 model). In Win64, an unsigned long is /always/ 32-bits, but in UNIX it scales: 64-bit in a 64-bit compile. In order to get a 64-bit int, I need to define it as "long long" or "int64". That's why John put this conditional declaration: #ifndef _WORD_T #define _WORD_T #if defined(_WIN64) typedef unsigned long long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif #endif So I suppose I should go through all the source code and find any other "long" declarations, and change them to similar conditional declarations. It's good to confirm that Judy *was* written to work with a 64-bit Word_t. I shouldn't have anything to worry about, once I add the conditional type declarations. At that point they should be equivalent. Don't you think? Doug, do you think this would be worthy of a 1.0.6, for us poor Windows developers? --- Tim Margheim On 10/19/2012 3:11 PM, Alan Silverstein wrote: > hoping Doug chimes in with better advice, but my recollection is that > you build libJudy two ways, for 32-bit or 64-bit. If you want 64-bit > pointers (array values), then just use the 64-bit build. Don't try > changing any defines... Shouldn't be necessary, and probably won't work > anyway due to the very specific bit-handling inside the library. All > done with "straight C" but correctness and efficiency is tied to some > assumptions about sizes and packing. > > Now am I misunderstanding something? > > Alan Silverstein |
From: john s. <sk...@us...> - 2012-10-19 21:14:06
|
On 20/10/2012, at 7:29 AM, Tim Margheim wrote: > > It's good to confirm that Judy *was* written to work with a 64-bit Word_t. I shouldn't have anything to worry about, once I add the conditional type declarations. At that point they should be equivalent. Don't you think? AFAIK Judy works fine on 64 bit Windows with the single change needed to ensure it actually uses a 64 bit Word_t. I have no proof, I haven't examined to the code, but Felix garbage collector and some test cases using Judy Arrays as ordinary data structures work on Windows 64. I expect a lot of effort went into making Judy "polymorphic" with respect to 32/64 bit choice. The build system is a bit of a mess (only works smoothly on Unix and auto-adapts for the host platform in a way that makes cross-compilation hard). The docs unfortunately emphasise use of macros, which is archaic (most developers would use C++ these days, and inline functions with references would be better). But actual design is pretty much perfect, and the code appears to be bug free. And of course .. the performance is phenomenal. -- john skaller sk...@us... http://felix-lang.org |
From: Alan S. <aj...@fr...> - 2012-10-20 02:58:14
|
John et al, > I expect a lot of effort went into making Judy "polymorphic" with > respect to 32/64 bit choice. Yes, that's my recollection from over 10 years ago. > The build system is a bit of a mess (only works smoothly on Unix and > auto-adapts for the host platform in a way that makes > cross-compilation hard). Agreed, by the time I realized that keying #ifdefs by OS or other type, rather than by "resource/attribute", was less portable, it was too late to fix it. I take exception to the "mess" comment, the make system is very clean and tidy and well-commented :-), merely complex and not sufficiently portable (in hindsight), which is a shame because libJudy is actually remarkably OS-agnostic. It only cares about malloc()/free() and word sizes, maybe endianness (I forget). Nowadays I educate people that if you can build for N known architectures or platforms, you are not portable to N+1 without work. Much smarter to build according to specific OS resources/features, or better yet, defer "binding" from compile/link to run-time if you can do so elegantly and without significant performance loss. > The docs unfortunately emphasise use of macros, which is archaic (most > developers would use C++ these days, and inline functions with > references would be better). Agreed, inline functions came along in HPUX too late for us, so I ended up using macros. They really are about the same functionality, but when complex, people aren't as comfortable operating with them. Then again, "inline" being merely a suggestion that the compiler can ignore, with macros we KNEW exactly what we were getting, choosing to pay code space for absolutely fast performance. Although Doug'll probably pipe up saying in modern compilers/computers, function call overhead time is negligible, etc. Cheers, Alan Silverstein |
From: john s. <sk...@us...> - 2012-10-20 06:29:49
|
On 20/10/2012, at 1:58 PM, Alan Silverstein wrote: > libJudy is actually remarkably OS-agnostic. Indeed, since it's primarily about memory operations, where the primary assumption is linear addressing. Actually the performance is highly *processor* sensitive since it is optimised for Intel/conventional cache structure: not clear if it would perform well on ARM processor for example. > Nowadays I educate people that if you can build for N known > architectures or platforms, you are not portable to N+1 without work. > Much smarter to build according to specific OS resources/features, or > better yet, defer "binding" from compile/link to run-time if you can do > so elegantly and without significant performance loss. > Agreed, inline functions came along in HPUX too late for us, so I ended > up using macros. They really are about the same functionality, The problem is that the macros "gloss over" the difference between rvalues and lvalues. In C++ with references, inline functions with the same signatures as the macros do not. Consequently the failure of the documentation to clearly indicate which arguments are lvalues and which are rvalues would be less significant. Basically I think reference types in C++ are a bad idea anyhow, in general the whole idea of lvalues is wrong. The right way is to use pointers, which the raw C interface does. Unfortunately the actual C interface isn't documented :) > Although Doug'll probably pipe up saying in modern compilers/computers, > function call overhead time is negligible, etc. This is not correct. In a loop, the cost of repeated subroutine calls (JSR instruction) can be negative, that is, it can be faster than if the code were inlined because of a combination of branch prediction plus the fact the smaller amount of code is more likely to squeeze into the cache. But this is only true if * there are at least two calls to the routine * the argument/result protocol fits neatly into register allocation Generally inlining offers a host of optimisation opportunities to the compiler that actual calls defeat. For example, invariant code in a subroutine can be lifted out of a loop calling it, but only if the routine is inlined (or the compiler is certain that there are no other calls to it). Similarly, argument/parameter passing conventions can be thrown out for particular calls. As above this doesn't actually require inlining: a specialised version of the subroutine could be made, but this is a lot of extra work for a compiler. It can be done -- Felix does it. But I doubt most C or C++ compilers are capable of such high level optimisations because they're * written in bad languages like C and C++ * too focussed on low level optimisations (register allocations etc). Anyhow the biggest problem with Judy is that the small user community cannot change it. We can't fix of improve it, except by cloning the repository. -- john skaller sk...@us... http://felix-lang.org |
From: Tim M. <tma...@ne...> - 2012-10-22 15:07:01
|
> Although Doug'll probably pipe up saying in modern compilers/computers, > function call overhead time is negligible, etc. I /have/ had a situation where the overhead made a difference though, in a very low-level, high-traffic function. I was optimizing an indexing process recently, and had a low-level function that was called ~34 million times in a 50-second test run. If I recall correctly, changing the function to a macro gave me a ~1% time savings. But that's more a matter of "negligible * infinite = noticeable". :) --- Tim Margheim |
From: Alan S. <aj...@fr...> - 2012-10-20 02:51:41
|
Tim et al, > I'm working on Win64 (LLP64 model), not UNIX (LP64 model). Gotcha. I don't recall hearing those terms before, maybe years ago and I forgot. I found this 15-year-old explanation: http://www.unix.org/version2/whatsnew/lp64_wp.html > In Win64, an unsigned long is /always/ 32-bits, but in UNIX it scales: > 64-bit in a 64-bit compile. In order to get a 64-bit int, I need to > define it as "long long" or "int64". I see. I wasn't aware of this (portability) difference between the two OS types. Sigh, more "Tower of Babel Syndrome." I guess that's why "they" recommend using types.h aliases instead of base types like int and long. And I guess we didn't do that in libJudy header files? Odd, I know we compiled it for 32+64 on both HPUX, Linux, and WinXP. > It's good to confirm that Judy *was* written to work with a 64-bit > Word_t. Oh, absolutely. > I shouldn't have anything to worry about, once I add the conditional > type declarations. At that point they should be equivalent. Don't > you think? Yeah, I think that's true, just surprised any post-facto fix is even needed, seems like #ifdef's in the Judy header files would already have handled it (unless we overlooked something).. Cheers, Alan Silverstein |
From: john s. <sk...@us...> - 2012-10-20 03:01:01
|
On 20/10/2012, at 1:51 PM, Alan Silverstein wrote: > I see. I wasn't aware of this (portability) difference between the two > OS types. Sigh, more "Tower of Babel Syndrome." I guess that's why > "they" recommend using types.h aliases instead of base types like int > and long. > > And I guess we didn't do that in libJudy header files? Odd, I know we > compiled it for 32+64 on both HPUX, Linux, and WinXP. Judy didn't select the right type for Windows, but the choice is in a single place. All the rest of the code is correct AFAIK. Probably when Judy was written Win64 didn't exist, and long long int didn't exist either. -- john skaller sk...@us... http://felix-lang.org |
From: Alan S. <aj...@fr...> - 2012-10-20 03:04:31
|
> Probably when Judy was written Win64 didn't exist, and long long int > didn't exist either. You know, I really don't remember? I do recall we put a little work into compiling on WinXP, and I think both 32 and 64 bit versions, but never got far enough to "sell" or deliver it for that platform. Cheers, Alan Silverstein |
From: Tim M. <tma...@ne...> - 2012-10-22 15:08:28
|
On 10/19/2012 10:00 PM, john skaller wrote: > Judy didn't select the right type for Windows, but the choice is in a > single place. All the rest of the code is correct AFAIK. That may be the case, but I did find some functions and objects that have Word_t variables right next to some signed "long" variables. (The BranchB object in JudyCount.c, for instance.) So in Windows, that's compiling as just a 32-bit int, while Doug & Alan wrote & tested Judy with it as a 64-bit int. It may be that a 32-bit int is actually enough even when Word_t is 64-bit, and it's somewhat reassuring that you haven't run into any problems in your use, John. (And memory is a concern for me, so I'd love to avoid doubling the size of the member variable ints.) But I'll be more comfortable ensuring that every variable is the same size as when Judy was being designed & tested--unless Doug or Alan remember the nitty-gritties well enough to express confidence that leaving the signed ints as 32-bit shouldn't be a problem. --- Tim Margheim |
From: Alan S. <aj...@fr...> - 2012-10-24 01:44:57
|
Tim et al, > But I'll be more comfortable ensuring that every variable is the same > size as when Judy was being designed & tested--unless Doug or Alan > remember the nitty-gritties well enough to express confidence that > leaving the signed ints as 32-bit shouldn't be a problem. Sorry, no recollection at all! Other than Windows wasn't well tested and things might have changed. Alan |
From: Tim M. <tma...@ne...> - 2012-10-24 14:41:31
|
On 10/23/2012 8:44 PM, Alan Silverstein wrote: > Tim et al, >> But I'll be more comfortable ensuring that every variable is the same >> size as when Judy was being designed & tested--unless Doug or Alan >> remember the nitty-gritties well enough to express confidence that >> leaving the signed ints as 32-bit shouldn't be a problem. > Sorry, no recollection at all! Other than Windows wasn't well tested > and things might have changed. > > Alan No worries, Doug already answered--it's safe to leave those "long" variables as 32-bit even in 64-bit. (By his coding practice, if the precise size had mattered he would have used a typedef instead a native C type.) So the Word_t typedef is the only one I need to worry about functioning differently in Windows. Tim |
From: Doug B. <dou...@ya...> - 2012-10-22 17:35:04
|
Tim: As a rule or thumb, I never use native C types (char, int, long, long long) except when it simply does not matter, such as the case you sited or when there is no choice. However, if I remember correctly, "long" was used instead of "int" in the case of JudyCount, because the compiler generated better code (in loops) in the 64bit case (with the HP-PA compiler). In the 32bit case an int and long is the same thing. The Judy project was funded by the HP-PA folks. Therefore, a lot of code decisions wereto "get around" compiler/hardware deficiencies. I have since learned mylesson: If you leave performance on the table, someone will pick it up and you will be left with a "follow them" legacy. When Microsoft decided to require a uint64_t to be "unsigned long long", I figured they would fix that mistake eventually. At that time an uint32_t and uint64_t had not been defined in a consistant place. I hope it is now in <stdint.h>. While I am reminiscing, I will mention some things about 1.0.5 Judy. PA-risc had some serious performance problems with subroutine calls and variable shifts. The designers knew it and always said the would fix it in "the next version", but it always fell below the "cut line" in the next version. Judy used "byte"accesses and macros to get around those problems. But at the cost of making Judy "Endian" dependent. Sometime later I modified Judy toget rid of the Endianess dependency, but at some more performance cost. The next version of Judy does not use "byte" accesses and uses macros only to make short, often used pieces of code more understandable. It also uses extensive variable shifts which are very fast on modern CPUs. Sometimes I make the Debug version of the macro a real subroutine. (The compiler will in-line it to generate the same code as the macro). This is an easy way to make sure the macro is written with my guidelines of correctness (no hidden parameters, and readable). The 64 bit version uses "bit-counting" (popcnt instruction) extensively. It disapoints me that Intel made a 64bit processor (core2-duo) without that instruction. It is probably going to be difficult to produce a 64bit Judy that runs at full speed on that processor. I simply do not want to hear "64bit Judy core-dumps on my core2-duo or 32bit Judy runs slow on my PentiumIII" I would be willing to make a version 1.0.6 of Judy, if someone would submit (for Microsoft) the fixes and test it. I do not have a Microsoft Windows development path. Last time I checked, I could not afford it. I am a one-person developer inretirement. Judy is still my passion and I am actively working on a much fasterversion. Thanks for your interest, doug PS. If I remember correctly, Windows does not pack structures by default. Make sure that the struct jp_t is 8 or 16 bytes in size with 32 and 64 bit compiles respectively. I.E. assert(sizeof(jp_t) == 16);assert(sizeof(Word_t) == 8); in the 64bit compile. Doug Baskins <dou...@ya...> >________________________________ > From: Tim Margheim <tma...@ne...> >To: jud...@li... >Sent: Monday, October 22, 2012 9:07 AM >Subject: Re: JudySL with 64-bit pointers as values > >On 10/19/2012 10:00 PM, john skaller wrote: >> Judy didn't select the right type for Windows, but the choice is in a >> single place. All the rest of the code is correct AFAIK. >That may be the case, but I did find some functions and objects that >have Word_t variables right next to some signed "long" variables. (The >BranchB object in JudyCount.c, for instance.) So in Windows, that's >compiling as just a 32-bit int, while Doug & Alan wrote & tested Judy >with it as a 64-bit int. > >It may be that a 32-bit int is actually enough even when Word_t is >64-bit, and it's somewhat reassuring that you haven't run into any >problems in your use, John. (And memory is a concern for me, so I'd >love to avoid doubling the size of the member variable ints.) > >But I'll be more comfortable ensuring that every variable is the same >size as when Judy was being designed & tested--unless Doug or Alan >remember the nitty-gritties well enough to express confidence that >leaving the signed ints as 32-bit shouldn't be a problem. > > >--- >Tim Margheim > >------------------------------------------------------------------------------ >Everyone hates slow websites. So do we. >Make your web apps faster with AppDynamics >Download AppDynamics Lite for free today: >http://p.sf.net/sfu/appdyn_sfd2d_oct >_______________________________________________ >Judy-devel mailing list >Jud...@li... >https://lists.sourceforge.net/lists/listinfo/judy-devel > > > |
From: Doug B. <dou...@ya...> - 2012-10-22 19:36:11
|
Pradeep: Sorry, that is correct. I think the purpose of that union was to force the compiler to "pack" the items. Thanks for your interest, doug Doug Baskins <dou...@ya...> >________________________________ > From: "Bisht, Pradeep" <pra...@ya...> >To: Doug Baskins <dou...@ya...>; Tim Margheim <tma...@ne...>; "jud...@li..." <jud...@li...> >Sent: Monday, October 22, 2012 12:16 PM >Subject: Re: JudySL with 64-bit pointers as values > > >Hello Doug, you meant "union jp_t". correct ? I don't see any "struct jp_t". > > > >typedef union J_UDY_POINTER // JP. > { > jpo_t j_po; // other than immediate indexes. > jpi_t j_pi; // immediate indexes. > } jp_t, *Pjp_t; > > > >________________________________ > From: Doug Baskins <dou...@ya...> >To: Tim Margheim <tma...@ne...>; "jud...@li..." <jud...@li...> >Sent: Monday, October 22, 2012 10:34 AM >Subject: Re: JudySL with 64-bit pointers as values > > >Tim: > > >As a rule or thumb, I never use native C types (char, int, long, long long) > >except when it simply does not matter, such as the case you sited or when >there is no choice. However, if I remember correctly, "long" was used >instead of "int" in the case of JudyCount, because the compiler generated >better code (in loops) in the 64bit case (with the HP-PA compiler). In the >32bit case an int and long is the same thing. The Judy project was funded >by the HP-PA folks. Therefore, a lot of code decisions wereto "get around" >compiler/hardware deficiencies. I have since learned mylesson: > >If you leave performance on the table, someone will pick it up and >you will be left with a "follow them" legacy. When Microsoft decided to require >a uint64_t to be "unsigned long long", I figured they would fix that >mistake eventually. At that time an uint32_t and uint64_t had not been >defined in a consistant place. I hope it is now in <stdint.h>. > > >While I am reminiscing, I will mention some things about 1.0.5 Judy. >PA-risc had some serious performance problems with subroutine calls and >variable shifts. The designers knew it and always said the would fix it in >"the next version", but it always fell below the "cut line" in the next version. > >Judy used "byte"accesses and macros to get around those problems. > >But at the cost of making Judy "Endian" dependent. Sometime later > >I modified Judy toget rid of the Endianess dependency, but at some more > >performance cost. > > >The next version of Judy does not use "byte" accesses and uses macros >only to make short, often used pieces of code more understandable. >It also uses extensive variable shifts which are very fast on modern CPUs. > >Sometimes I make the Debug version of the macro a real subroutine. >(The compiler will in-line it to generate the same code as the macro). > >This is an easy way to make sure the macro is written with my guidelines >of correctness (no hidden parameters, and readable). The 64 bit version >uses "bit-counting" (popcnt instruction) extensively. It disapoints me >that Intel made a 64bit processor (core2-duo) without that instruction. >It is probably going to be difficult to produce a 64bit Judy that runs at full speed >on that processor. I simply do not want to hear "64bit Judy core-dumps on my > >core2-duo or 32bit Judy runs slow on my PentiumIII" > > > >I would be willing to make a version 1.0.6 of Judy, if someone would submit >(for Microsoft) the fixes and test it. I do not have a Microsoft Windows > >development path. Last time I checked, I could not afford it. > > > >I am a one-person developer inretirement. Judy is still my passion and I > >am actively working on a much fasterversion. > > > >Thanks for your interest, > > >doug > > >PS. If I remember correctly, Windows does not pack structures by default. Make >sure that the struct jp_t is 8 or 16 bytes in size with 32 and 64 bit compiles respectively. > >I.E. assert(sizeof(jp_t) == 16);assert(sizeof(Word_t) == 8); in the 64bit compile. > > >Doug Baskins <dou...@ya...> > > > >>________________________________ >> From: Tim Margheim <tma...@ne...> >>To: jud...@li... >>Sent: Monday, October 22, 2012 9:07 AM >>Subject: Re: JudySL with 64-bit pointers as values >> >>On 10/19/2012 10:00 PM, john skaller wrote: >>> Judy didn't select the right type for Windows, but the choice is in a >>> single place. All the rest of the code is correct AFAIK. >>That may be the case, but I did find some functions and objects that >>have Word_t variables right next to some signed "long" variables. (The >>BranchB object in JudyCount.c, for instance.) So in Windows, that's >>compiling as just a 32-bit int, while Doug & Alan wrote & tested Judy >>with it as a 64-bit int. >> >>It may be that a 32-bit int is actually enough even when Word_t is >>64-bit, and it's somewhat reassuring that you haven't run into any >>problems in your use, John. (And memory is a concern for me, so I'd >>love to avoid doubling the size of the member variable ints.) >> >>But I'll be more comfortable ensuring that every variable is the same >>size as when Judy was being designed & tested--unless Doug or Alan >>remember the nitty-gritties well enough to express confidence that >>leaving the signed ints as 32-bit shouldn't be a problem. >> >> >>--- >>Tim Margheim >> >>------------------------------------------------------------------------------ >>Everyone hates slow websites. So do we. >>Make your web apps faster with AppDynamics >>Download AppDynamics Lite for free today: >>http://p.sf.net/sfu/appdyn_sfd2d_oct >>_______________________________________________ >>Judy-devel mailing list >>Jud...@li... >>https://lists.sourceforge.net/lists/listinfo/judy-devel >> >> >> >------------------------------------------------------------------------------ >Everyone hates slow websites. So do we. >Make your web apps faster with AppDynamics >Download AppDynamics Lite for free today: >http://p.sf.net/sfu/appdyn_sfd2d_oct >_______________________________________________ >Judy-devel mailing list >Jud...@li... >https://lists.sourceforge.net/lists/listinfo/judy-devel > > > > > > > |
From: Tim M. <tma...@ne...> - 2012-10-22 20:05:38
|
On 10/22/2012 12:34 PM, Doug Baskins wrote: > Tim: > > As a rule or thumb, I never use native C types (char, int, long, long > long) > except when it simply does not matter, such as the case you sited or when > there is no choice. Ah, excellent. I appreciate that rule of thumb. Just to confirm, you're saying you don't expect it to matter if those "long" variables end up being 32-bits even in a 64-bit compile of Judy? The definition of Word_t is the only one I should need to fix? > When Microsoft decided to require a uint64_t to be "unsigned long long", > I figured they would fix that mistake eventually. At that time an > uint32_t and > uint64_t had not been defined in a consistant place. I hope it is > now in <stdint.h>. I found a discussion on StackOverflow. Visual Studio 2003-2008 lacked a <stdint.h>, but it is now included in VS2010. http://stackoverflow.com/questions/126279/c99-stdint-h-header-and-ms-visual-studio#2628014 Are you saying it will be better for clarity if the Win64 Word_t definition says "uint64_t" instead of "long long"? In other words: #ifndef _WORD_T #define _WORD_T #ifndef _WIN64 typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedef uint64_t Word_t, * PWord_t; // In 64-bit Windows, we need to explicitly specify a 64-bit word, since "long" doesn't scale in _WIN64. #endif #endif That looks like it should work, and I like the self-documentation of an explicit "uint64_t". > While I am reminiscing, I will mention some things about 1.0.5 Judy. [...] Thanks for those details. > The next version of Judy does not use "byte" accesses and [...] The faster version you're working on for future release? Can't wait. > I would be willing to make a version 1.0.6 of Judy, if someone would > submit > (for Microsoft) the fixes and test it. I do not have a Microsoft Windows > development path.Last time I checked, I could not afford it. I'm more than happy to do that. With the help of some of John Skaller's suggestions, I'm working out a configuration that leaves the source untouched (except for the Win64 tweak to the Word_t definition in Judy.h), and allows for easy use with Visual Studio. Ideally, any the Windows-specific or VS-specific files will be in a separate sub-folder in the repository. I think I can make a "vsbuild.bat" as an alternative to "build.bat", using Visual Studio's command-line compiler. Along these lines: http://stackoverflow.com/questions/84404/using-visual-studios-cl-from-a-normal-command-line I haven't worked with VS's command-line compiler before, so I'm still figuring out the cleanest way to handle the *.lib and *.dll files. I may be able to use your build.bat mostly as-is, or I may need to make a *.sln solution/project file. > I am a one-person developer inretirement. Judy is still my passion and I > am actively working on a much fasterversion. And I greatly appreciate it. I hope this'll work out to make Judy more easily-available to Windows developers. I do have access to a couple test computers with Visual Studio 2003 and 2008 Express installed, so I'll be able to test it with them, too. > PS. If I remember correctly, Windows does not pack structures by > default. Make > sure that the struct jp_t is 8 or 16 bytes in size with 32 and 64 bit > compiles respectively. > I.E. assert(sizeof(jp_t) == 16);assert(sizeof(Word_t) == 8); in the > 64bit compile. Will do. --- Tim Margheim Neuric Technologies, LLC |
From: Doug B. <dou...@ya...> - 2012-10-23 00:27:29
|
Tim: > Ah, excellent. I appreciate that rule of thumb. Just to confirm, you're saying you don't expect it to matter if those "long" variables end up being > 32-bits even in a 64-bit compile of Judy? The definition of Word_t is the only one I should need to fix? Yes, I confirm. Even 16 bits (short) is probably enough, but would be slower in some processors/compiled code. The definition of Word_t must pass this: assert(sizeof(Word_t) == sizeof (void *); or Judy will not work. Also, if this does not pass, Judy will not work: assert(sizeof(jp_t) == (2 * sizeof(void *))); > Are you saying it will be better for clarity if the Win64 Word_t definition says "uint64_t" instead of "long long"? In other words: a Word_t is a uint64_t in a 64 bit compile and a uint32_t in a 32 bit compile. A long long is a typo in my mind. Your #ifndef _WORD_T will probably work in windows, but I prefer using uint64_t if possible. I suspect that even a #define for 64 bits is still not standard, so kluges continue. In the new Judy all compiles fail without a #define JU_64BIT or/(but not and) a #define JU_32BIT. That was a big mistake I made on JudySL. The lack of a #define JU_64BIT or -DJU_64BIT will compile 64bit, but fail on execution. Looking forward to your windows changes. Thanks for your interest, doug P.S. It has been suggested that the 4 JudyxTables.c files be delivered, rather than constructed. I give them the names: JudyLTables32.c, JudyLTables64.c, Judy1Tables32.c and Judy1Tables64.c in the new Judy. But I still use the JudyTables.c tool to construct them. Doug Baskins <dou...@ya...> >________________________________ > From: Tim Margheim <tma...@ne...> >To: jud...@li... >Sent: Monday, October 22, 2012 2:04 PM >Subject: Re: JudySL with 64-bit pointers as values >_WORD_T > >On 10/22/2012 12:34 PM, Doug Baskins wrote: > >Tim: >> >> >>As a rule or thumb, I never use native C types (char, int, long, long long) >> >>except when it simply does not matter, such as the case you sited or when >>there is no choice. >> Ah, excellent. I appreciate that rule of thumb. Just to confirm, you're saying you don't expect it to matter if those "long" variables end up being 32-bits even in a 64-bit compile of Judy? The definition of Word_t is the only one I should need to fix? > >When Microsoft decided to require a uint64_t to be "unsigned long long", >>I figured they would fix that mistake eventually. At that time an uint32_t and >>uint64_t had not been defined in a consistant place. I hope it is now in <stdint.h>. >> I found a discussion on StackOverflow. Visual Studio 2003-2008 lacked a <stdint.h>, but it is now included in VS2010. >http://stackoverflow.com/questions/126279/c99-stdint-h-header-and-ms-visual-studio#2628014 > >Are you saying it will be better for clarity if the Win64 Word_t definition says "uint64_t" instead of "long long"? In other words: > >#ifndef _WORD_T >#define _WORD_T >#ifndef _WIN64 >typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. >#else >typedef uint64_t Word_t, * PWord_t; // In 64-bit Windows, we need to explicitly specify a 64-bit word, since "long" doesn't scale in _WIN64. >#endif >#endif > >That looks like it should work, and I like the self-documentation of an explicit "uint64_t". > > >While I am reminiscing, I will mention some things about 1.0.5 Judy. [...] >> Thanks for those details. > > > >>The next version of Judy does not use "byte" accesses and [...] >> The faster version you're working on for future release? Can't wait. > > >I would be willing to make a version 1.0.6 of Judy, if someone would submit >>(for Microsoft) the fixes and test it. I do not have a Microsoft Windows >> >>development path. Last time I checked, I could not afford it. >> I'm more than happy to do that. With the help of some of John Skaller's suggestions, I'm working out a configuration that leaves the source untouched (except for the Win64 tweak to the Word_t definition in Judy.h), and allows for easy use with Visual Studio. Ideally, any the Windows-specific or VS-specific files will be in a separate sub-folder in the repository. > >I think I can make a "vsbuild.bat" as an alternative to "build.bat", using Visual Studio's command-line compiler. Along these lines: >http://stackoverflow.com/questions/84404/using-visual-studios-cl-from-a-normal-command-line > >I haven't worked with VS's command-line compiler before, so I'm still figuring out the cleanest way to handle the *.lib and *.dll files. I may be able to use your build.bat mostly as-is, or I may need to make a *.sln solution/project file. > > >I am a one-person developer inretirement. Judy is still my passion and I >> >>am actively working on a much fasterversion. >> And I greatly appreciate it. I hope this'll work out to make Judy more easily-available to Windows developers. > >I do have access to a couple test computers with Visual Studio 2003 and 2008 Express installed, so I'll be able to test it with them, too. > > >PS. If I remember correctly, Windows does not pack structures by default. Make >>sure that the struct jp_t is 8 or 16 bytes in size with 32 and 64 bit compiles respectively. >> >>I.E. assert(sizeof(jp_t) == 16);assert(sizeof(Word_t) == 8); in the 64bit compile. >> Will do. > > >--- >Tim Margheim >Neuric Technologies, LLC > > >------------------------------------------------------------------------------ >Everyone hates slow websites. So do we. >Make your web apps faster with AppDynamics >Download AppDynamics Lite for free today: >http://p.sf.net/sfu/appdyn_sfd2d_oct >_______________________________________________ >Judy-devel mailing list >Jud...@li... >https://lists.sourceforge.net/lists/listinfo/judy-devel > > > |
From: Tim M. <tma...@ne...> - 2012-10-23 16:58:08
|
On 10/22/2012 7:27 PM, Doug Baskins wrote: > In the new Judy all compiles > fail without a #define JU_64BIT or/(but not and) a #define JU_32BIT. > That was a big mistake I > made on JudySL. The lack of a #define JU_64BIT or -DJU_64BIT will > compile 64bit, but fail on execution. Ah, so for now, when I build, I need to both set the 64-bit option on the compiler, and do a "#define JU_64BIT" or -DJU_64BIT on each file as I compile it? Tim |
From: john s. <sk...@us...> - 2012-10-23 01:29:00
|
On 23/10/2012, at 11:27 AM, Doug Baskins wrote: > > P.S. It has been suggested that the 4 JudyxTables.c files be delivered, rather than constructed. > I give them the names: JudyLTables32.c, JudyLTables64.c, Judy1Tables32.c and Judy1Tables64.c > in the new Judy. But I still use the JudyTables.c tool to construct them. This is a good idea. I would also suggest that those JudyL/Judy1 codes generated by the same file with command line macros be replaced by *.include file and two C files which include them with the macros set in those files. It should be possible to build Judy by simply compiling all the *.c files in the library directory and linking them to a DLL or making a static link library. I would personally recommend going to C++ and eliminating as many macros as possible. C++ allows better factorisation of common code, including with routine polymorphic over the word size. Some conditional compilation may still be needed. As a matter of interest, has Judy performance on ARM processors been considered? [I know it builds on iOS] Also, have you considered a public repository for your work so you can get help and feedback from the small user community? -- john skaller sk...@us... http://felix-lang.org |
From: Tim M. <tma...@ne...> - 2012-10-23 18:19:12
|
On 10/22/2012 8:28 PM, john skaller wrote: > On 23/10/2012, at 11:27 AM, Doug Baskins wrote: >> P.S. It has been suggested that the 4 JudyxTables.c files be delivered, rather than constructed. >> I give them the names: JudyLTables32.c, JudyLTables64.c, Judy1Tables32.c and Judy1Tables64.c >> in the new Judy. But I still use the JudyTables.c tool to construct them. Great! That does seem simpler, and would allow someone to include them in a project without the extra step of constructing them themselves. (If you're using a command-line compiler, it doesn't make any practical difference, but delivering the files does seem more straightforward.) > This is a good idea. I would also suggest that those JudyL/Judy1 codes > generated by the same file with command line macros be replaced > by *.include file and two C files which include them with the macros > set in those files. It should be possible to build Judy by simply compiling > all the *.c files in the library directory and linking them to a DLL or > making a static link library. I agree that that would be valuable. John and I had talked about that in private email. I was going to bring it up when I finished my other tweaks. (Which I'm testing now, by the way. It was simpler than I thought.) So for example, JudyFirst.c (which is currently copied-and-renamed during building to "JudyLFIrst.c" and "Judy1First.c", then compiled with "-DJUDYL" and "-DJUDY1" respectively) would be delivered as the two files: //Judy1First.c #define JUDY1 #include JudyFirst.include //JudyLFirst.c #define JUDYL #include JudyFirst.include I imagine the same thing would be done with the other -D macros being used with a few of the files. Again, if you're using a command-line compiler with a delivered batch file then there's no difference, but it would make the files-as-delivered a bit easier to use with an alternate build system. I like very much the idea of making the build process into "compile all the delivered *.c files and link them". If you like that change and want to integrate it, I'm willing to do the renaming on my system and submit it along with my other tweaks. (I can also edit sh_build, but not test it.) What do you think about packaging up those changes along with the table-generation change, and making them Judy 1.0.6? --- Tim Margheim Neuric Technologies, LLC |
From: Tim M. <tma...@ne...> - 2012-10-23 23:26:44
|
On 10/22/2012 7:27 PM, Doug Baskins wrote: > The definition of Word_t must pass this: assert(sizeof(Word_t) == > sizeof (void *); or Judy will not work. > Also, if this does not pass, Judy will not work: assert(sizeof(jp_t) > == (2 * sizeof(void *))); Those asserts are passing, but I'm hitting a couple runtime errors in the test code I've written, and I'm wondering if anyone has any ideas. My test code: I'm populating a JudySL tree with a few strings, each mapped to a uint value. Then I'm looking up each string to confirm it maps to the right key. Then I'm iterating through the tree. (I'm stepping through the code in the Visual Studio IDE and confirming the values of the variables in a "Watch" window, rather than doing printouts.) It works in 32-bit mode, up until the call to JSLF(). At that point, it crashes. The other problem happens in 64-bit mode when populating the tree: The first call to JSLI() works fine, but then I get an access violation the second time. Here's the code: -------------------------- Pvoid_t JudySLRoot = NULL; // main JudySL array Pvoid_t PValue = NULL; // pointer to JudySL array value char *Strings[11] = {"zoo", "ark", "Zed", "Ate", "basdfahsdfkjashdflkjashdflkjahsljkhfeiuahslfkje", "model", "Man", "fire", "f.re", "quorum" }; for (uint64_t I=0; I<10; I++) //map each string to its index in the Strings array { char *String_Key = Strings[I]; JSLI(PValue, JudySLRoot, (uint8_t*)String_Key); *((Word_t*)PValue) = I; } uint64_t Retrieved; for (uint64_t I=0; I<10; I++) //Confirm that the stored values match each string's index { char *String_Key = Strings[I]; JSLG(PValue, JudySLRoot, (uint8_t*)String_Key); Retrieved = *((Word_t*)PValue); } uint8_t *Retrieved_String=NULL; PValue = NULL; JSLF(PValue, JudySLRoot, Retrieved_String); while (PValue!=NULL) //Iterate through the table. { char *String = (char*)Retrieved_String; Retrieved = *((Word_t*)PValue); JSLN(PValue, JudySLRoot, Retrieved_String); } -------------------------- Any ideas? I did confirm that the 64-bit version of the *.lib is using the 64-bit tables, by the way. (The generated *.c tables include j__L_Leaf1 through j__L_Leaf7.) --- Tim Margheim Neuric Technologies, LLC |
From: john s. <sk...@us...> - 2012-10-24 00:17:39
|
On 24/10/2012, at 10:26 AM, Tim Margheim wrote: > > Here's the code: > -------------------------- > Pvoid_t JudySLRoot = NULL; // main JudySL array > Pvoid_t PValue = NULL; // pointer to JudySL array value > char *Strings[11] = {"zoo", "ark", "Zed", "Ate", "basdfahsdfkjashdflkjashdflkjahsljkhfeiuahslfkje", "model", "Man", "fire", "f.re", "quorum" }; > > for (uint64_t I=0; I<10; I++) //map each string to its index in the Strings array > { > char *String_Key = Strings[I]; > JSLI(PValue, JudySLRoot, (uint8_t*)String_Key); I think it best NOT to use the macros. It's too hard to figure out what's going on. JSLI here expects PValue to be an lvalue. > *((Word_t*)PValue) = I; > } > > uint64_t Retrieved; > for (uint64_t I=0; I<10; I++) //Confirm that the stored values match each string's index > { > char *String_Key = Strings[I]; > JSLG(PValue, JudySLRoot, (uint8_t*)String_Key); > Retrieved = *((Word_t*)PValue); > } > > uint8_t *Retrieved_String=NULL; > PValue = NULL; > JSLF(PValue, JudySLRoot, Retrieved_String); Here is your bug. JSLF does NOT retrieve the first element in the array. It is poorly named. First does NOT mean what you think. It means satisfying >= whereas Next means satisfying > So your initial value of RetrievedString is incorrect, it must be done like this: uint8_t * RetrievedString = (RetrievedString *) calloc(20000); It must be a pointer to storage initialised with at least on leading byte of value zero. Judy is highly consistent and logical in its interface. It has two problems at the C API level: the use of void *, Word_t* etc is super unsafe, especially in C. This is a stupidity in C in particular (not in Judy). The second problem is that First doesn't mean first in the array it means >= the key. So you have 5 search operators: >= > == < <= ALL of which require a valid input key. The names First, Next, Get, Prev, Last, are a bit unconventional (Ge, Gt, Eq, Lt, Le) would have been better. But they're just names, the actual C interface is perfect. -- john skaller sk...@us... http://felix-lang.org |
From: Alan S. <aj...@fr...> - 2012-10-24 01:37:41
|
> The second problem is that First doesn't mean first in the > array it means >= the key. So you have 5 search operators: > > >= > == < <= > > ALL of which require a valid input key. The names First, > Next, Get, Prev, Last, are a bit unconventional > (Ge, Gt, Eq, Lt, Le) would have been better. But they're just > names, the actual C interface is perfect. My bad, I thought First/Next/Prev/Last was a good model, over 10 years ago. Yeah, in hindsight GE/GT/LT/LE might have been better, although I would uppercase both letters of the abbreviation. :-) Yes, all searches require a non-null starting point, but it's just a numeric value, not a pointer, so how did it "crash"? As you said, at a deeper level I think the concept of searching either inclusive or exclusive of the starting point was a solid concept... I'd seen the same "design pattern" many times before in other contexts. But libJudy is so abstract that all kinds of object naming was a problem, including the name of the package itself! Cheers, Alan Silverstein |
From: john s. <sk...@us...> - 2012-10-24 04:03:37
|
On 24/10/2012, at 12:37 PM, Alan Silverstein wrote: > > Yes, all searches require a non-null starting point, but it's just a > numeric value, not a pointer, so how did it "crash"? Because Tim is using JudySL, the key for that is a pointer to a char array not a numeric value. > > As you said, at a deeper level I think the concept of searching either > inclusive or exclusive of the starting point was a solid concept... I'd > seen the same "design pattern" many times before in other contexts. Me too. Probably starting with IBM VSAM. I think that was one of the first abstract keyed access file systems. Before that keyed searching was actually -- would you believe it -- done by the disk drive in *hardware*. In those days, disks had tracks .. but no sectors. Hmm .. 5Meg removable drives weighed a ton, had 19 platters, and if you were caught smoking with a mile of one you got fired, because a single particle of dust of smoke could cause a head crash. Hey, that would be a good name for a Techno band, "Head Crash". > But > libJudy is so abstract that all kinds of object naming was a problem, Well, the only problem with Judy is the unconventional leading uppercase letter, kind of spoils nice lists of data structures to have just one starting with a capital :) -- john skaller sk...@us... http://felix-lang.org |
From: Benjamin H. <bho...@ea...> - 2012-10-24 04:11:54
|
> > Well, the only problem with Judy is the unconventional leading > uppercase letter, kind of spoils nice lists of data structures to have > just one starting with a capital :) > True, but since the origin of the name is a proper noun, it seems appropriate. > Ben |