From: Alan S. <aj...@fr...> - 2014-02-17 18:59:30
|
John et al, hoping Doug Baskins will reply to you as well, but in the meantime as a co-developer of libJudy, here's my $0.02 (and not worth much more I admit). > Judy is one of the most useful libraries out there and I use it all > the time. VERY nice to hear that! We busted our butts for a couple of years, ultimately a team of four people, trying to make it world-class. But we discovered how hard it is to "sell a library," especially without academic credentials or writeups. HP ended up canceling the project, laying off the team, and open-sourcing the code in 2002. > Something that always bothered me about it is the somewhat unusual > calling conventions, the use of macros and silent casting of 'void *'s > meaning it is very easy to accidentally use the API incorrectly and > it's conventions differ from pretty much every other library out there > making combining code problematic. Yeah, yeah, we know, we know, sorry. I'm personally responsible for not envisioning the use of autoconf, etc, for greater portability, which is ironic since libJudy actually has minimal (very few) OS/library dependencies. > I went ahead and used c99's features (inline functions, immediate > struct values, defined uintptr_t and bool types) to make a typesafe > and more conformant api. OK, that's fun to watch. My only concern is our experience that it was way too easy to naively do something (like thread-safeness) that ruined the library's performance. You can't have too much run-time overhead going in and out or you can cut it in half, or even much worse. Syntactic sugar that doesn't overwhelm the algorithm -- great. Innocent layers of type-conversions or other foo, ugh. > I figured I post it if anyone else found it handy. Thanks for sharing it back in the spirit of open source. Let's see what others say, people who are more actively using libJudy. Me, I did use it successfully for a few projects later (contracting for HP and Avago), but retired a year ago. Cheers, Alan Silverstein |
From: Alan S. <aj...@fr...> - 2014-02-19 04:51:50
|
John etc, > You shouldn't underestimate how much more exposure and influence the > project has now that it is open sourced. Besides the programs out > there that link against libjudy, there are probably dozen of radix > tree implementations out there than were inspired by reading about > judys implementation. WOW! That's good to hear. It appeared to me, at least, that just a few people, a handful in the whole world, even noticed the package. (But I'm not Doug, managing the SourceForge site and tracking downloads, so I don't know.) > The heap compression techniques I use in my jhc compiler were heavily > influenced by your work on judy and in particular your attention to > cache line loads. Cool! I suspect this was based more on reading my "Shop Manual" (wish I'd found a better name than that, and written it in HTML not Framemaker), rather than the source code? :-) > ...I could sweep just by discarding the judy tree and not having to > traverse the whole heap looking for free bits helped allocation time a > lot. Yeah, that kind of usage, what I call "synergistic serendipity" or if you prefer, "serendipitous synergy," is what Doug, and later I, got excited about with libJudy. We used to imagine, "what could you do with a dynamic sparse array if it was really cheap and fast?" We found all kinds of interesting applications, but in most cases using libJudy required some unusual thinking at best, warped at worst. We spent some time during the project looking for "killer apps" to demonstrate the value of the library... But it was surprisingly hard. Basically you had to have libJudy in your toolbox during project design and implementation. It was very difficult to retrofit afterwards to significant tools. Quite often 80% of the same benefit could be had just by tweaking some hashing constants that had gotten dusty. > autoconf is great... But as later emails to the group stated, even that's debatable. :-) Anyway as De Marco observed, "people optimize whatever is measured, and hide their garbage elsewhere." Despite knowing this, and trying to cover ALL the bases, it's obvious in hindsight that what we created had various shortcomings, garbage where we didn't "measure" closely enough. Most of our project's effort went into version 4 (Judy IV), the final version, which took a lot longer than expected; core coding and documentation efforts. We were focused on HPUX (after all HP was paying the bills), which didn't even have C explicit inline functions at that time. So open source portability suffered, and in hindsight, documentation and code readability too. > I always assumed the odd API was to conform to some internal HP coding > standard created by someone who has a history with algol (hence the > in/out arguments) I've seen odder ones forced on programmers out > there. No, actually it's kind of funny in a sad way. I was always a very persnickety UI and manual entry designer. I wrote a lot of Unix commands and documentation over the years, both for the HPUX product and for "toys," so I was very good at this. For libJudy I came up with some very solid (I thought) manual entries. But Doug really wanted to simplify the API; hence the macro API. Later he had a manual writer essentially throw away half of my documentation work (without telling me first) to make the macro API the main entry point, and the formal signatures and related description (whatever remained) an afterthought. In the long light of time, I think Doug was wrong, and I deserve an "I told you so" about that change. The macro API has confused people more than Doug ever intended, and made it even harder to code properly too. It seems to me that only "real experts" (who can handle the raw truth, even prefer it) end up seriously using libJudy. But conversely, another common gripe about libJudy is how the source code is painfully abstract to read. Yeah, a dynamic sparse array is already a difficult (abstract) notion with poor real-world metaphors. Being aware of various UI principles like, "one object for each name, one name for each object," I drove Doug nuts insisting that we think very carefully about what to call everything, aiming for clarity, precision, "psychological distinction", etc. (Sorry Doug, resistor-colored "flowers" as the leaves in the tree were cute but confusing!) But I yielded on the name of the library itself (Judy), which I now regret because it's "weird" and turns some people off. Maybe we should have just gone with an acronym after all. But what? An early version, first time I heard of Doug's little project, used it myself, and got excited, was called SAMM = sparse array memory manager, but libJudy grew well beyond that first concept. Anyway since we needed very tight control over the C code, but didn't have inline functions, and being comfortable myself with C macros, even fairly hairy ones where you pass as a parameter the name of another macro, I tried to build minimal-redundancy code that would be as clear and fast as possible. In hindsight though, the excessively abstract macros themselves threw most people, no matter how much I tried to find good object names and document everything carefully. Maybe Doug was right, and it was better to just repeat nearly identical code sections 7-8 times rather than "compressing" the source with levels of macros. > ...my background is in embedded programming/operating systems and > performance programmimng, I never got over the habit of examining the > assembly output of every code change... Good. :-) We did that a lot. Cheers, Alan Silverstein |
From: john s. <sk...@us...> - 2014-02-19 06:12:28
|
On 19/02/2014, at 3:51 PM, Alan Silverstein wrote: > John etc, John(s) :) > The macro API has confused people more > than Doug ever intended, and made it even harder to code properly too. > It seems to me that only "real experts" (who can handle the raw truth, > even prefer it) end up seriously using libJudy. Once you get the hang of it .. its not *that* hard to use the C API. That's because the design is exactly right. It uses pointer values where needed and pointers to pointers where needed: 1) If you're not modifying the array, use a void* = array. 2) If you are, use a void** = array* 3) If you're fetching a key or value, provide a pointer to the place it can be put. 4) If you're modifying a value, you have to provide a pointer to a slot that can hold a pointer to the slot IN the array, so you can store a new value into the array. If your replace or insert fails, you get a NULL stored there. It's very precise. Once you understand it you do not need the manual, you can calculate from first principles what the API must be (more or less :) As for killer apps .. that's the wrong idea. There are numerous places in most code where Judy is the ONLY data structure that can be used: A) Where the key is an integer or pointer B) Where you need O(1) fast random AND sequential access There isn't anything else. One vital property: it is perfectly safe to iterate through a judy array and modify it as you go. Try doing THAT with an STL container! The core restriction on Judy is that the key has to be a machine word or a null terminated string. The value can be anything (just use a pointer). Seriously, the thing I'd want to see next is a PURELY FUNCTIONAL version of Judy. It would revolutionise the world. Functional programming would actually be practical for the first time. I'm not sure it can be done .. but functional code handles trees reasonably well. We'd probably drop to O(log N) performance but it could be a very fast O(log N) -- john skaller sk...@us... http://felix-lang.org |
From: john s. <sk...@us...> - 2014-02-19 06:34:34
|
On 19/02/2014, at 5:11 PM, john skaller wrote: > > As for killer apps .. that's the wrong idea. There are numerous > places in most code where Judy is the ONLY data structure > that can be used: > > A) Where the key is an integer or pointer > B) Where you need O(1) fast random AND sequential access Here's my use of it. In my Garbage Collector, I keep this map: pointer -> RTTI (includes size of one object) pointer -> count (if count != 1) So given ANY pointer I want to know: is it a Felix pointer (managed by the GC) or a C pointer Felix doesn't know about? So here's what I do. First try to find a pointer less than or equal to the given pointer. If there isn't one its a foreign pointer. if the pointer is equal to the candidate, its a native pointer and points to the object base. Otherwise get the element size from the RTTI. Then get the count of elements from the second Judy Array. Multiple and add to the base pointer returned from the first array. Now check if our candidate pointer is less than the result. If it is, its a pointer INTO the object, otherwise it's a foreign pointer. [Felix doesn't allow pointers "one past the end", otherwise you could use less then or equal instead of just less than] The most vital pointer here is that Felix uses malloc() for allocation. The only way to do the above operation without Judy with similar performance requires implementing your own suballocator. The collector itself clearly scans all the pointers from the root, using the RTTI to tell it where pointers are. It does the above test on every pointer so it can trace a pointed at object. Then all the unreachable objects destroyed and put into a temporary Judy Array. Then the storage is reaped. This has to be done separately because unreachable objects can reach each other in their destructors. Note destructors can also allocate new objects! (Just because an object is unreachable doesn't mean it cant itself reach reachable objects ..:) This is a fairly naive algorithm but the key thing is it is totally un-invasive. The RTTI pointer is not stored in the object. The algorithm is not fast. It wouldn't be tenable at all without Judy. -- john skaller sk...@us... http://felix-lang.org |
From: Alan S. <aj...@fr...> - 2014-02-20 22:24:14
|
John and all, > It's very precise. Once you understand it you do not need the manual, > you can calculate from first principles what the API must be (more or > less :) Right, and the practical problem is that type-weakness in C allows you to make coding errors that aren't detected at compile-time. We knew about that but didn't have a good solution. (I know, I know, just use Felix...) > As for killer apps .. that's the wrong idea. There are numerous > places in most code where Judy is the ONLY data structure that can be > used: OK, but selling a new coding library (as HP wanted to do) is even harder than selling a new mousetrap. Again, my take-away metaphor from that time was that programmers needed to have it in their toolboxes during implementation phase, and the question was how to get it there. How does any successful language, tool, or library get "into the toolbox?" It's a bit of a mystery to me because there are so many crappy ones out there that still managed to at least temporarily win the evolutionary arms race. I think a key to success is to do what's today called "go viral," meaning people hear about it from other people and think, "I gotta get me some of that" -- politics, standards, or crappy aspects of the product be damned. > ...it is perfectly safe to iterate through a judy array and modify it > as you go. Try doing THAT with an STL container! Yup. > The core restriction on Judy is that the key has to be a machine word > or a null terminated string. The value can be anything (just use a > pointer). Although as I've mentioned before, a lot of academic cycles are spun on issues that assume from the get-go that both keys and values must be variable-length; either choose a number up front, or even worse (but naturally more common), all lengths allowed. Null-terminated strings (JudySL) are built on top of JudyL, showing that mixing arbitrary-length keys is fine, but if you want length-associated instead of length-terminated keys, you pretty much have to sort first by length (which is at least very cheap and fast using JudyL), even if you miss out on sorted-by-accident as a "free" side-effect. The JudyN code which is out there somewhere illustrates exactly how to do this. With arbitrary-length keys sorted by length, you can put any bytes at all you want into the key, similar to Perl strings. Cheers, Alan Silverstein |
From: john s. <sk...@us...> - 2014-02-20 22:39:26
|
On 21/02/2014, at 9:24 AM, Alan Silverstein wrote: > > keys is fine, but if you want length-associated instead of > length-terminated keys, you pretty much have to sort first by length > (which is at least very cheap and fast using JudyL), even if you miss > out on sorted-by-accident as a "free" side-effect. Not sure exactly what you mean, but you can always take an arbitrary N bytes of anything, including embedded 0 bytes, and translate it to a UTF-8 NULL terminated string. Then you can use JudySL just fine, and the proper sort order is preserved because that's a property of UTF-8 encoding. -- john skaller sk...@us... http://felix-lang.org |
From: Alan S. <aj...@fr...> - 2014-02-21 00:08:07
|
> Not sure exactly what you mean, but you can always take an arbitrary N > bytes of anything, including embedded 0 bytes, and translate it to a > UTF-8 NULL terminated string. Good point. I'm well-acquainted with base-64 encoding for example. Just never thought of "safe reversible encoding to a null-terminated string" as a practical solution for arbitrary key-to-value mappings... Didn't start seeing base-64 until some years later... Clever. So you CAN use JudySL today, if you like, with a translator front-end, to store any keys at all, can't you? I wonder how it performs (both space and time) in real life? For example, people who understand hashing in some detail often work very hard on improving their hashing algorithms to optimize the hash table, what I call a "good spectral property" that keeps the distribution random (although still usually key-distribution sensitive) and the synonym chains short. However unless they measure, they don't realize that they ruined performance by spending way too much CPU time on the hashing. > Then you can use JudySL just fine, and the proper sort order is > preserved because that's a property of UTF-8 encoding. Interesting too, makes sense. Bearing in mind of course that "sorting" itself is a rich and variable concept, witness all the choices in sort(1). Simple lexicographical sorting is the baseline, and all libJudy gives you for free (since it's a radix tree). Cheers, Alan |
From: John M. <jo...@re...> - 2014-02-21 01:15:53
|
On Thu, Feb 20, 2014 at 4:08 PM, Alan Silverstein <aj...@fr...> wrote: > Good point. Â I'm well-acquainted with base-64 encoding for example. > Just never thought of "safe reversible encoding to a null-terminated > string" as a practical solution for arbitrary key-to-value mappings... > Didn't start seeing base-64 until some years later... Â Clever. You don't need anything as complicated as utf8 for this. You can use COBS (constant overhead byte stuffing) to remove NULLs, It is dead simple and guarenteed to never grow the key by more than a single byte for every 254 bytes of key. This means you can statically allocate your key buffers just a couple bytes or so larger than you would otherwise and still have room independent of the data. This isn't even that hacky as this is what COBS is designed for. I am not sure why the technique isn't more commonly used, it is pretty ingenious IMHO. http://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing John |
From: john s. <sk...@us...> - 2014-02-21 07:17:41
|
On 21/02/2014, at 12:15 PM, John Meacham wrote: > > designed for. I am not sure why the technique isn't more commonly used, it > is pretty ingenious IMHO. Well I've been around for 4 decades and I'd never heard of it. > http://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing And it even has the algorithm there in C. Thanks! -- john skaller sk...@us... http://felix-lang.org |
From: John M. <jo...@re...> - 2014-02-20 10:53:33
|
On Tue, Feb 18, 2014 at 8:51 PM, Alan Silverstein <aj...@fr...> wrote: > I suspect this was based more on reading my "Shop Manual" (wish I'd > found a better name than that, and written it in HTML not Framemaker), > rather than the source code? Â :-) Yeah for the most part. But I do dig into the source code some too. >> ...I could sweep just by discarding the judy tree and not having to >> traverse the whole heap looking for free bits helped allocation time a >> lot. > > Yeah, that kind of usage, what I call "synergistic serendipity" or if > you prefer, "serendipitous synergy," is what Doug, and later I, got > excited about with libJudy. Â We used to imagine, "what could you do with > a dynamic sparse array if it was really cheap and fast?" Â We found all > kinds of interesting applications, but in most cases using libJudy > required some unusual thinking at best, warped at worst. > > We spent some time during the project looking for "killer apps" to > demonstrate the value of the library... Â But it was surprisingly hard. > Basically you had to have libJudy in your toolbox during project design > and implementation. Â It was very difficult to retrofit afterwards to > significant tools. Â Quite often 80% of the same benefit could be had > just by tweaking some hashing constants that had gotten dusty. I would have no idea how to go about marketing a library. :) When I was at google I got people to use the generic stuff I made interally, but that was by word of mouth and no money was changing hands. (Just some 20% time sharing, I'll fix your bug if you fix mine.). A couple ended up officially supported and staffed. But the endgame for a very successful interal library is being open sourced. like protobufs. So your judy development path makes a lot of sense to me. > Anyway as De Marco observed, "people optimize whatever is measured, and > hide their garbage elsewhere." Â Despite knowing this, and trying to > cover ALL the bases, it's obvious in hindsight that what we created had > various shortcomings, garbage where we didn't "measure" closely enough. > Most of our project's effort went into version 4 (Judy IV), the final > version, which took a lot longer than expected; core coding and > documentation efforts. Â We were focused on HPUX (after all HP was paying > the bills), which didn't even have C explicit inline functions at that > time. Â So open source portability suffered, and in hindsight, > documentation and code readability too. I was wondering how you went about benchmarking and testing judy with what simulates real data. I wrote a small radix tree library that was a subset of judy in functionality that I needed to embed in something, It just had fixed sized linear nodes, leaf bitsets, and uncompressed nodes and no root compression. I was testing it against judy with random data and found it to have surprisingly good, comparable performance... of course. the key thing there is the word 'random'. as soon as any structure or clustering or anything other than uniform randomness was added then judy did wildly better. Which makes sense to me, but how did you come up with 'non-random' datasets to test with since clearly uniform random data is not very useful in general for testing beyond a point. >> I always assumed the odd API was to conform to some internal HP coding >> standard created by someone who has a history with algol (hence the >> in/out arguments) I've seen odder ones forced on programmers out >> there. > > No, actually it's kind of funny in a sad way. Â I was always a very > persnickety UI and manual entry designer. Â I wrote a lot of Unix > commands and documentation over the years, both for the HPUX product and > for "toys," so I was very good at this. Â For libJudy I came up with some > very solid (I thought) manual entries. Â But Doug really wanted to > simplify the API; hence the macro API. Â Later he had a manual writer > essentially throw away half of my documentation work (without telling me > first) to make the macro API the main entry point, and the formal > signatures and related description (whatever remained) an afterthought. > > In the long light of time, I think Doug was wrong, and I deserve an "I > told you so" about that change. Â The macro API has confused people more > than Doug ever intended, and made it even harder to code properly too. > It seems to me that only "real experts" (who can handle the raw truth, > even prefer it) end up seriously using libJudy. Yeah, it actually took some digging to figure out I could call the JudyXFoo calls directly and that I could just pass PJE0 as the last parameter and everything will work just fine. I see some references in the documentation to the macros being "faster" because they can be inlined, but as far as I can tell I don't see that every happening. Is this something that used to happen? > But conversely, another common gripe about libJudy is how the source > code is painfully abstract to read. Â Yeah, a dynamic sparse array is > already a difficult (abstract) notion with poor real-world metaphors. > Being aware of various UI principles like, "one object for each name, > one name for each object," I drove Doug nuts insisting that we think > very carefully about what to call everything, aiming for clarity, > precision, "psychological distinction", etc. Â (Sorry Doug, > resistor-colored "flowers" as the leaves in the tree were cute but > confusing!) A whole lot of CS knowledge is terminology. It is very important to get something consistent to work with. I think object oriented programming wouldn't have been nearly as popular had someone not come up with a common terminology for 'design patterns'. Sure everyone was programming with them, but once the terminology happened they became much easier, even though you were writing the same code. I definitely reused the judy terminology. Interesting anecdote about bad terminology, back in the day I was given the task of integrating our software with some offshore developed code. It didn't sound difficult but once I opened their code I found out that for some reason, they used the word 'bucket' for every data structure. (language barrier?). pointers were buckets. arrays were buckets. linked lists were buckets. a linked list holding pointers would be described in the comments as a 'bucket of buckets'. Kind of comical in retrospect, not so much at the time. In any case, terminology is key :) Strangely enough, the code wasn't that bad if you looked at the structure and ignored the names of everything, I have no idea how the programmer kept everything straight in their head. > Anyway since we needed very tight control over the C code, but didn't > have inline functions, and being comfortable myself with C macros, even > fairly hairy ones where you pass as a parameter the name of another > macro, I tried to build minimal-redundancy code that would be as clear > and fast as possible. Â In hindsight though, the excessively abstract > macros themselves threw most people, no matter how much I tried to find > good object names and document everything carefully. Â Maybe Doug was > right, and it was better to just repeat nearly identical code sections > 7-8 times rather than "compressing" the source with levels of macros. I was thinking about trying to do something with generated code by something more powerful than macros. Like a program you feed in the targets cache size, key size you want and payload if anything and it will calculate the optimal state machine and spit it out.... John |
From: John M. <jo...@re...> - 2014-02-18 02:25:46
|
On Mon, Feb 17, 2014 at 10:59 AM, Alan Silverstein <aj...@fr...> wrote: > John et al, hoping Doug Baskins will reply to you as well, but in the > meantime as a co-developer of libJudy, here's my $0.02 (and not worth > much more I admit). > >> Judy is one of the most useful libraries out there and I use it all >> the time. > > VERY nice to hear that! We busted our butts for a couple of years, > ultimately a team of four people, trying to make it world-class. But we > discovered how hard it is to "sell a library," especially without > academic credentials or writeups. HP ended up canceling the project, > laying off the team, and open-sourcing the code in 2002. You shouldn't underestimate how much more exposure and influence the project has now that it is open sourced. Besides the programs out there that link against libjudy, there are probably dozen of radix tree implementations out there than were inspired by reading about judys implementation. The heap compression techniques I use in my jhc compiler were heavily influenced by your work on judy and in particular your attention to cache line loads. I actually originally used judy in the garbage collector, having the mark/sweep passes keep the used bitmap in a Judy1 tree. This actually ended up being very efficient, I could sweep just by discarding the judy tree and not having to traverse the whole heap looking for free bits helped allocation time a lot. I have since moved to a different system when I switched to a dedicated allocator, however I still use it for heap debugging. I use a JudyL array to associate meta-info with each memory location,this is much better than adding tag words to the blocks themselves as it doesn't change the layout at all which often can obscure bugs. >> Something that always bothered me about it is the somewhat unusual >> calling conventions, the use of macros and silent casting of 'void *'s >> meaning it is very easy to accidentally use the API incorrectly and >> it's conventions differ from pretty much every other library out there >> making combining code problematic. > > Yeah, yeah, we know, we know, sorry. I'm personally responsible for not > envisioning the use of autoconf, etc, for greater portability, which is > ironic since libJudy actually has minimal (very few) OS/library > dependencies. autoconf is great, a pet peeve of mine is when people try to reimplement it without understanding all it gives you, cross-platform builds are one of those things you don't appreciate how tricky it is to do portably until you come accross a non-autoconfed package you need to install. I always assumed the odd API was to conform to some internal HP coding standard created by someone who has a history with algol (hence the in/out arguments) I've seen odder ones forced on programmers out there. >> I went ahead and used c99's features (inline functions, immediate >> struct values, defined uintptr_t and bool types) to make a typesafe >> and more conformant api. > > OK, that's fun to watch. My only concern is our experience that it was > way too easy to naively do something (like thread-safeness) that ruined > the library's performance. You can't have too much run-time overhead > going in and out or you can cut it in half, or even much worse. > Syntactic sugar that doesn't overwhelm the algorithm -- great. Innocent > layers of type-conversions or other foo, ugh. Yeah, don't worry. my background is in embedded programming/operating systems and performance programmimng, I never got over the habit of examining the assembly output of every code change. With modern compilers there will be no overhead. Of course, profile,profile,profile is always the key. I increased the block allocation speed in jhc making every compiled program go faster by 3-5% (!!) by changing one if(x) a else b to if(!x) b else a Actually the changes I made can potentially increase performance in a few ways, not really relevant to this due to the inlining but useful to know one is the use of the standard 'bool' type, which in addition to getting rid of a dreaded cast to bool class of bugs can optimize code, namely, an ABI can return a bool directly in a condition code rather than a register making a branch on it trivial. Another is that since it knows the value is zero or one, branchless optimizations like this can happen bool b; z = b ? a : b -> int foo[] = { a, b}; z = foo[b]; or the more direct but a bit more contrived. z = b ? 43 : 44 -> z = 43 + b another is by using actual structures the compiler can do alias analysis. for instance foo(judyl_t *j, int *x) { if (jEmpty(*j)) blah; *x = 0; if (jEmpty(*j)) bar; } Since judyl_t and int are different types, the compiler can assume that x does not point to the same memory location as j, meaning it can combine both jEmpty calls into one caching the result and removing a load. It would not be able to do this with Pvoid_t values without explicitly checking if x == j. >> I figured I post it if anyone else found it handy. > > Thanks for sharing it back in the spirit of open source. Let's see what > others say, people who are more actively using libJudy. Me, I did use > it successfully for a few projects later (contracting for HP and Avago), > but retired a year ago. Yup. Most everything I do is open source, even when I work for a company, not necessarily directly, I was in finance for a while so obviously wasn't open sourcing our proprietary trading models, but our infrastructure was linux and OSS based so improvements I made I'd contribute back. John |
From: john s. <sk...@us...> - 2014-02-18 03:40:25
|
On 18/02/2014, at 1:25 PM, John Meacham wrote: > On Mon, Feb 17, 2014 at 10:59 AM, Alan Silverstein <aj...@fr...> wrote: >> > debugging. I use a JudyL array to associate meta-info with each memory > location,this is much better than adding tag words to the blocks themselves > as it doesn't change the layout at all which often can obscure bugs. Yes, I do that too, its very nice. > autoconf is great, I beg to differ: IMHO its a heap of rubbish. The whole concept is extremely bad. It's "suck it and see" concept hides issues and can't work in a cross compilation context, or even with multiple compilers (as I have). Autoconf is also very annoying when it fails because to build or upgrade a package you end up having to build or upgrade autoconf too. The way forward is standardisation. > those things you don't appreciate how tricky it is to do portably until you > come accross a non-autoconfed package you need to install. You don't appreciate how utterly bad autoconf is until you have to modify a build to make it work and you get presented with a totally impenetrable mess. Even the help is a heap of crap, letting you a whole lot of non-interesting rubbish and NOT telling you the things you actually need to know for your specific package. What you have to understand is the most code doesn't require ANY configuration. Something like Judy, for example, only needs ONE configuration parameter (32 or 64 bits). I have a compiler with fairly extensive libraries that provides platform independent stuff for most services including async socket I/O, and some configuration is certainly necessary. Surprising little though. Autoconf has one (and only one) legitimate use: building low level system tools (binutils) and of course your compiler (gcc, clang, etc). After that its the *compilers* job to provide a standard API. > I always assumed the odd API was to conform to some internal HP coding > standard created by someone who has a history with algol (hence the in/out > arguments) I've seen odder ones forced on programmers out there. The C API is not odd, its perfect. Every argument is more or less exactly as it should be (give or take a casting which is the inevitable result of using crap languages like C). The MACRO API is crap. It hides where you need to use an lvalue and where an rvalue will do. That's bad idea in C (and an even worse one in C++). > another is by using actual structures the compiler can do alias analysis. > for instance > > foo(judyl_t *j, int *x) { > if (jEmpty(*j)) > blah; > *x = 0; > if (jEmpty(*j)) > bar; > } > > Since judyl_t and int are different types, the compiler can assume that x > does not point to the same memory location as j, meaning it can combine both > jEmpty calls into one caching the result and removing a load. It would not > be able to do this with Pvoid_t values without explicitly checking if x == j. Unfortunately, alias tests is compilers are really dangerous. Because the type system is so weak people cast left right and centre. In my compiler I do it systematically and I have to put -fno-strict-aliasing. I wish I don't have to do this but unfortunately C just doesn't have a good enough type system or semantics to avoid it sometimes ;( C++ is a bit better, but it's still a problem. -- john skaller sk...@us... http://felix-lang.org |
From: John M. <jo...@re...> - 2014-02-18 04:33:36
|
On Mon, Feb 17, 2014 at 7:39 PM, john skaller <sk...@us...> wrote: >> autoconf is great, > > I beg to differ: IMHO its a heap of rubbish. > The whole concept is extremely bad. It's "suck it and see" concept > hides issues and can't work in a cross compilation context, or even > with multiple compilers (as I have). > > Autoconf is also very annoying when it fails because to build > or upgrade a package you end up having to build or upgrade > autoconf too. > > The way forward is standardisation. Indeed, standardization is better, but something important to realize is that autoconf is designed explicitly for the cases when standardization _falls down_. If everything was fully POSIX compliant and published how it works via pkg-config or somesuch then there would be no need for a configuration manager, autoconf exists because that is not the case. So saying standardization is the answer is fully correct, it's just not the situation we have or have to deal with. It does mean my configuratoin scripts get smaller and simpler and can even be elided sometimes now that I can rely on a better ecosystem. But when I have to configure and build something written 10 years ago stuffed on a CD-ROM and never touched since, or something written now on a 15 year old system then there isn't much other choice. And these cases come up, autoconf fills in the corner cases the original author never thought about and certainly doesn't want to. Don't get me wrong, there are _many_ things that can be improved about autoconf, the macro based language and no insulation between the input language (macro sh) and output (sh) limits its portability in many ways. >> those things you don't appreciate how tricky it is to do portably until you >> come accross a non-autoconfed package you need to install. > > You don't appreciate how utterly bad autoconf is until you have > to modify a build to make it work and you get presented  with > a totally impenetrable mess. Even the help is a heap of crap, > letting you a whole lot of non-interesting rubbish and NOT telling > you the things you actually need to know for your specific package. Everything is easy once you know how to do it, including autoconf/automake. And you are right that autoconf is a crappy macro based language, but what is important is the functionality, it is very good at doing what it does. Actually, I wouldn't say good, lots to be improved there, but there are no replacements that actually do what it does. > What you have to understand is  the most code doesn't > require ANY configuration. Something like Judy, for example, > only needs ONE configuration parameter (32 or 64 bits). Yes, but does it need the ability to compile for a 32 bit arm architecture on a 64 bit intel host? a standardized 'install' and 'uninstall' mechanism that takes things like different permissions into account (install.sh exists for a reason, it is needed on some architectures) _and_ work on operating systems the author of the library has never even heard of let alone can test on? Configuration is only a small part of what autoconf/automake does. Heck, right now I am using a very gutted subset of Judy1 arrays on a 16 bit system, probably not something the original authors ever envisioned. > I have a compiler with fairly extensive libraries that > provides platform independent stuff for most services > including async socket I/O, and some configuration is > certainly necessary. Surprising little though. > > Autoconf has one (and only one) legitimate use: > building low level system tools (binutils) and of course > your compiler (gcc, clang, etc). > > After that its the *compilers* job to provide a standard API. I agree, but again, autoconf is explicitly and exactly for those cases where standardization cannot be relied on. It is about making your library work, whatever it takes, on whatever system it is thrown onto. There is no _elegant_ solution to this problem as it is inherently meant to work in an _inelegant_ system. And standardization of the langauge does not imply the system has a POSIX compliant 'grep' or that binaries should still go in /usr/local/bin or any of that meta-information. autoconf is as much about the system around the language (like, the options passed to your compiler for instance) than the actual language. I have rarely needed an autoconf #define inside of a code file but it is needed for figuring out the right options to the c compiler, where to install the program, etc. Most of the changes to the code can be acomplished by filling in any standardized APIs it doesn't support. like generating a local stdint.h file or a non broken snprintf if it doesn't exist on the system. It is pretty vacuous and trivially true to say "the solution to autoconf is to live in a world where autoconf isn't needed" because that is not the world we all live in, working towards it is great. It is a major effort of mine, but it doesn't mean I can pretend I live in it. Don't get me wrong, I'd love a replacement for autoconf that _actually replicates autoconfs functionality_. Every replacement I have seen pretty much just replicates the subset of functionality the author has encountered. >> I always assumed the odd API was to conform to some internal HP coding >> standard created by someone who has a history with algol (hence the in/out >> arguments) I've seen odder ones forced on programmers out there. > > The C API is not odd, its perfect. Every argument is more or less > exactly as it should be (give or take a casting which is the inevitable > result of using crap languages like C). Yes, which is why I get rid of the need for casting and use standardized types with my interface and make it almost to use the arguments in the incorrect position. As in, I fix the issue you raise and others that have bugged me too. I can't just shrug and say C is crap when I find something broken, I fix it if I can to make my life easier in the future. > The MACRO API is crap. It hides where you need to use > an lvalue and where an rvalue will do. That's bad idea in C > (and an even worse one in C++). Yup. I stopped using it after it fell down on simple idioms like for loops. >> another is by using actual structures the compiler can do alias analysis. >> for instance >> >> foo(judyl_t *j, int *x) { >>  if (jEmpty(*j)) >>   blah; >>  *x = 0; >>  if (jEmpty(*j)) >>   bar; >> } >> >> Since judyl_t and int are different types, the compiler can assume that x >> does not point to the same memory location as j, meaning it can combine both >> jEmpty calls into one caching the result and removing a load. It would not >> be able to do this with Pvoid_t values without explicitly checking if x == j. > > Unfortunately, alias tests is compilers are really dangerous. > Because the type system is so weak people cast left right > and centre. In my compiler I do it systematically and I have > to put -fno-strict-aliasing. I wish I don't have to do this but > unfortunately C just doesn't have a good enough type system > or semantics to avoid it sometimes ;( Yeah, I had issues with that in jhcs back end for a while too. I was able to eventually fix it by using variants of the struct method like I use here but there were pitfalls along the way. John |