From: Bisht, P. <pra...@ya...> - 2011-02-04 22:02:01
|
I just noticed that Judy sourforge page says "operating systems: All 32-bit MS Windows (95/98/NT/2000/XP),". Looks like 64-bit is not supported. Is it correct? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 10:09:37 AM Subject: Judy on 64-bit windows; Hello, has any one been successful in compiling and using Judy on 64-bit windows. First I think there is a bug in Judy.h - typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. should have been #ifdef JU_WIN #ifdef JU_64BIT typedef uint64_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedef uint32_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif #else // JU_WIN typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif Am I correct? Before making this change Judy1/LTaleGen.exe was failing in generating the tables and printing error ""BUG, in %sPopToWords, sizes not big enough for object\n". Now that I'm able to compile, it is crashing. my sample code is: void BuildJA () { Pvoid_t Parray = (Pvoid_t)NULL; // empty JudyL array Word_t lba, *Pvalue; // value for one index unsigned int i; for (i = 0; i < 1000; i++) { lba = i; // it crashes here for i = 1 JLI (Pvalue, Parray, lba); *Pvalue = 1; } } /* the main program */ int __cdecl main (int argc, char *argv[]) { BuildJA (); return (0); } crash happens at : static __inline int j__udySearchLeafW(Pjlw_t Pjlw, Word_t LeafPop1, Word_t Index) { SEARCHLEAFNATIVE(Word_t, Pjlw, LeafPop1, Index); } called from JudyLIns (). Am I doing something wrong here? Kindly note that on 32-bit windows I have done extensive testing several hours and several million entries without any problem. Thanks. |
From: Pilot, S. <SPilot@StateStreet.com> - 2011-02-04 23:14:58
|
Is it? ________________________________ From: Bisht, Pradeep <pra...@ya...> To: jud...@li... <jud...@li...> Sent: Fri Feb 04 17:01:53 2011 Subject: Re: Judy on 64-bit windows; I just noticed that Judy sourforge page says "operating systems: All 32-bit MS Windows (95/98/NT/2000/XP),". Looks like 64-bit is not supported. Is it correct? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 10:09:37 AM Subject: Judy on 64-bit windows; Hello, has any one been successful in compiling and using Judy on 64-bit windows. First I think there is a bug in Judy.h - typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. should have been #ifdef JU_WIN #ifdef JU_64BIT typedef uint64_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedef uint32_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif #else // JU_WIN typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif Am I correct? Before making this change Judy1/LTaleGen.exe was failing in generating the tables and printing error ""BUG, in %sPopToWords, sizes not big enough for object\n". Now that I'm able to compile, it is crashing. my sample code is: void BuildJA () { Pvoid_t Parray = (Pvoid_t)NULL; // empty JudyL array Word_t lba, *Pvalue; // value for one index unsigned int i; for (i = 0; i < 1000; i++) { lba = i; // it crashes here for i = 1 JLI (Pvalue, Parray, lba); *Pvalue = 1; } } /* the main program */ int __cdecl main (int argc, char *argv[]) { BuildJA (); return (0); } crash happens at : static __inline int j__udySearchLeafW(Pjlw_t Pjlw, Word_t LeafPop1, Word_t Index) { SEARCHLEAFNATIVE(Word_t, Pjlw, LeafPop1, Index); } called from JudyLIns (). Am I doing something wrong here? Kindly note that on 32-bit windows I have done extensive testing several hours and several million entries without any problem. Thanks. |
From: Bisht, P. <pra...@ya...> - 2011-02-05 01:18:46
|
oky so I have the fixed the crash it was due to (I think) use of things like <n>UL. It should have been the word size which on 64-bit windows is ULL. Now I can run my example but the number of indexes are very less than expected. I inserted 1000 unique indexes and that's what I see in 32-bit program but in 64-bit windows I see only 531 indexes. Can anybody please confirm if win-64 bit is supported? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 2:01:53 PM Subject: Re: Judy on 64-bit windows; I just noticed that Judy sourforge page says "operating systems: All 32-bit MS Windows (95/98/NT/2000/XP),". Looks like 64-bit is not supported. Is it correct? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 10:09:37 AM Subject: Judy on 64-bit windows; Hello, has any one been successful in compiling and using Judy on 64-bit windows. First I think there is a bug in Judy.h - typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. should have been #ifdef JU_WIN #ifdef JU_64BIT typedef uint64_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedef uint32_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif #else // JU_WIN typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif Am I correct? Before making this change Judy1/LTaleGen.exe was failing in generating the tables and printing error ""BUG, in %sPopToWords, sizes not big enough for object\n". Now that I'm able to compile, it is crashing. my sample code is: void BuildJA () { Pvoid_t Parray = (Pvoid_t)NULL; // empty JudyL array Word_t lba, *Pvalue; // value for one index unsigned int i; for (i = 0; i < 1000; i++) { lba = i; // it crashes here for i = 1 JLI (Pvalue, Parray, lba); *Pvalue = 1; } } /* the main program */ int __cdecl main (int argc, char *argv[]) { BuildJA (); return (0); } crash happens at : static __inline int j__udySearchLeafW(Pjlw_t Pjlw, Word_t LeafPop1, Word_t Index) { SEARCHLEAFNATIVE(Word_t, Pjlw, LeafPop1, Index); } called from JudyLIns (). Am I doing something wrong here? Kindly note that on 32-bit windows I have done extensive testing several hours and several million entries without any problem. Thanks. |
From: Bisht, P. <pra...@ya...> - 2011-02-05 08:21:30
|
finally i get to make my program work. there were some more 1L in JudyPrivate.h which needed to be changed to 1LL (the proper word size on 64-bit windows). Is there any regression suite that I can run make sure I have not broken anything? Also I would like to give back to community this code which now works on windows 64-bit - is there way I can do it. Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 5:18:37 PM Subject: Re: Judy on 64-bit windows; oky so I have the fixed the crash it was due to (I think) use of things like <n>UL. It should have been the word size which on 64-bit windows is ULL. Now I can run my example but the number of indexes are very less than expected. I inserted 1000 unique indexes and that's what I see in 32-bit program but in 64-bit windows I see only 531 indexes. Can anybody please confirm if win-64 bit is supported? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 2:01:53 PM Subject: Re: Judy on 64-bit windows; I just noticed that Judy sourforge page says "operating systems: All 32-bit MS Windows (95/98/NT/2000/XP),". Looks like 64-bit is not supported. Is it correct? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 10:09:37 AM Subject: Judy on 64-bit windows; Hello, has any one been successful in compiling and using Judy on 64-bit windows. First I think there is a bug in Judy.h - typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. should have been #ifdef JU_WIN #ifdef JU_64BIT typedef uint64_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedef uint32_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif #else // JU_WIN typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif Am I correct? Before making this change Judy1/LTaleGen.exe was failing in generating the tables and printing error ""BUG, in %sPopToWords, sizes not big enough for object\n". Now that I'm able to compile, it is crashing. my sample code is: void BuildJA () { Pvoid_t Parray = (Pvoid_t)NULL; // empty JudyL array Word_t lba, *Pvalue; // value for one index unsigned int i; for (i = 0; i < 1000; i++) { lba = i; // it crashes here for i = 1 JLI (Pvalue, Parray, lba); *Pvalue = 1; } } /* the main program */ int __cdecl main (int argc, char *argv[]) { BuildJA (); return (0); } crash happens at : static __inline int j__udySearchLeafW(Pjlw_t Pjlw, Word_t LeafPop1, Word_t Index) { SEARCHLEAFNATIVE(Word_t, Pjlw, LeafPop1, Index); } called from JudyLIns (). Am I doing something wrong here? Kindly note that on 32-bit windows I have done extensive testing several hours and several million entries without any problem. Thanks. |
From: Doug B. <dou...@ya...> - 2011-02-05 14:46:23
|
Pradeep: Thank you for your persistence on finding this problem. I do not have the resources (money to support Microsoft compilers) to get you support in a timely manner. I depend on the Open Source Community -- such as yourself. I retired in 2002. However, I am still very active on improved versions of Judy in my spare time. I did not sleep very well last night because I was thinking on how to help you and prevent a re-occurrence of this problem in the future. I am the author of most of Judy. As you (painfully) found out, C does not define the number of bits of its variables and constants precisely. I used my own typedef's in Judy to specify exactly how many bits were in the variables, but the constants (I believe) are still problematic. When Judy was written, Microsoft seemed to pick a "suspect" solution for 64 bit programs. Judy used: uint8_t, uint16_t, uint32_t, and uint64_t to specify the required variables with specific bit lengths. Later, the Unix and Linux community specified header files that did the same thing (and the same names). For variables where the bit sizes did not matter, I let the compilers decide and used "int" hoping the compiler would use the fastest size for the processor used. Today, processors are good about not caring about the size of variables affecting speed, so I suppose, we programmers should specify everything to be safer avoiding bugs and compile errors. If you know of a portable way to specify the bit length of constants, I would be very appreciated in learning how. I was very disappointed with many of the C compilers requiring the use of 1ULL and such without a way of specifying the number of bits in the constant or not needing to -- and they were DIFFERENT. I suspect that Microsoft changed their method, because Judy use to work. I tried very hard to get Judy to work on all OSes of either 32 or 64 Bit. I still do not know how to make the Build tools Compile and Install the 32 and 64 sizes on 64 Bit machines. I personally use 64 Bit Linux to test 32 and 64 Bit versions of Judy. I wish I could use 64 Bit Windows too. I believe 32 bit machines are a thing of the past because the price of memory has dropped to a point that makes 32 bit OS support a nuisance. Please send me your changes that you made to Judy to compile on 64 bit Windows. I would also very much appreciate a "tutorial" on how to compile Judy with Microsoft compilers that I could forward to people who are trying the same. I get a surprising number of emails requesting that information and I just have nothing worth while to suggest. I will take a look at your changes and test them on a 32/64 Bit Linux and a 64 Bit Mac machine with my regression tests. Sorry, I can afford Windows capability. Thank you for your Interest, Doug PS. I took me a lot of time to get Judy to compile and work on machines of Little and Big Endianness without #ifdef's -- something Microsoft does not do for Windows. Doug Baskins <dou...@ya...> ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Sat, February 5, 2011 1:21:23 AM Subject: Re: Judy on 64-bit windows; finally i get to make my program work. there were some more 1L in JudyPrivate.h which needed to be changed to 1LL (the proper word size on 64-bit windows). Is there any regression suite that I can run make sure I have not broken anything? Also I would like to give back to community this code which now works on windows 64-bit - is there way I can do it. Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 5:18:37 PM Subject: Re: Judy on 64-bit windows; oky so I have the fixed the crash it was due to (I think) use of things like <n>UL. It should have been the word size which on 64-bit windows is ULL. Now I can run my example but the number of indexes are very less than expected. I inserted 1000 unique indexes and that's what I see in 32-bit program but in 64-bit windows I see only 531 indexes. Can anybody please confirm if win-64 bit is supported? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 2:01:53 PM Subject: Re: Judy on 64-bit windows; I just noticed that Judy sourforge page says "operating systems: All 32-bit MS Windows (95/98/NT/2000/XP),". Looks like 64-bit is not supported. Is it correct? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 10:09:37 AM Subject: Judy on 64-bit windows; Hello, has any one been successful in compiling and using Judy on 64-bit windows. First I think there is a bug in Judy.h - typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. should have been #ifdefJU_WIN #ifdefJU_64BIT typedefuint64_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedefuint32_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif #else // JU_WIN typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif Am I correct? Before making this change Judy1/LTaleGen.exe was failing in generating the tables and printing error ""BUG, in %sPopToWords, sizes not big enough for object\n". Now that I'm able to compile, it is crashing. my sample code is: void BuildJA () { Pvoid_t Parray = (Pvoid_t)NULL; // empty JudyL array Word_t lba, *Pvalue; // value for one index unsigned int i; for (i = 0; i < 1000; i++) { lba = i; // it crashes here for i = 1 JLI (Pvalue, Parray, lba); *Pvalue = 1; } } /* the main program */ int __cdecl main (int argc, char *argv[]) { BuildJA (); return (0); } crash happens at : static __inline int j__udySearchLeafW(Pjlw_t Pjlw, Word_t LeafPop1, Word_t Index) { SEARCHLEAFNATIVE(Word_t, Pjlw, LeafPop1, Index); } called from JudyLIns (). Am I doing something wrong here? Kindly note that on 32-bit windows I have done extensive testing several hours and several million entries without any problem. Thanks. |
From: john s. <sk...@us...> - 2011-02-05 17:33:58
|
On 06/02/2011, at 1:46 AM, Doug Baskins wrote: > I believe 32 bit machines are a thing of the past because the price > of memory has dropped to a point that makes 32 bit OS support a nuisance. On desktops perhaps, but what happens is that 32 bit machines will outnumber 64 bit ones even more, as the price drops they get used in embedded devices such as cars, etc etc. I have major software running on *8* bit machines. Because the chips cost under $5. As to portability: there are several ways to establish the environment. The simplest is to used C99: uintptr_t, end of story, so damn bad if someone has an archaic compiler that doesn't support it. The next simplest method is to cheat, and define Word_t depending on a couple of compiler macros: this isn't portable but it is easy to add a new platform. The most precise method is trial and error testing BUT it this technique does not support cross-compilation. Everything else can be calculated in a portable way. For example 64 1 bits is just ~(Word_t)0u on a 64 bit machine, windows or Linux doesn't matter. Shift count 64 is given by CHAR_BIT * sizeof(Word_t) where CHAR_BIT is always 8 on every modern platform. Basically Judy shouldn't have much problem being portable, it is after all just a data structure. In fact recall my request, I'd really like: JudyXX32, JudyXX64 I'd actually like to have 32 bit Judy arrays on 64 bit platform because "int" is 32 bits. -- john skaller sk...@us... |
From: john s. <sk...@us...> - 2011-02-05 18:06:15
|
On 06/02/2011, at 1:46 AM, Doug Baskins wrote: > Pradeep: > > Thank you for your persistence on finding this problem. I do not have the resources > (money to support Microsoft compilers) to get you support in a timely manner. I > depend on the Open Source Community -- such as yourself. I retired in 2002. However, > I am still very active on improved versions of Judy in my spare time. > > I did not sleep very well last night because I was thinking on how to help you and prevent > a re-occurrence of this problem in the future. You already stated that: rely on the community :) There is NO WAY you can or should handle the technical development and testing of the research tool AND all the testing and build issues on multiple platforms. We were discussing this before. The question is how to get people appropriate access to the repository. My system (Felix) is using Git, and we have a clone of Judy on GitHub.com. We do not use any makefiles, autoconf, or any other junk: our build system is written in Python and we have the files there to just C-compile everything, no configuration or whatever is required (other than the macros in Judy.h which make the choices). This is lucky because Judy is a plain old data structure, as opposed to say networking support or whatever (try making all that stuff work portably!) The key technique to portability is quite simple: you have to first chose your word size Word_t, typically just typedef uintptr_t Word_t; will do. The golden rule for bit fiddling is: always use *unsigned* values because the results are completely deterministic. It is sometimes messy to calculate bit masks portably, but it can always be done. For example: ~(Word_t)0 is all 1 bits, CHAR_BIT * sizeof(Word_t) is the count of bits (I forget with sizeof() counts as a constant in C .. it should be in C++ I think: you can always just define #define WORD_T_BITS 64 or 32 depending on the platform .. you can't need more than one or two of these values. Basically you have & | ^ ~ as well as << >> to do calculations, it's usually enough. The trick is casts. For example: printf("Word=%llu\n",(unsigned long long)x); That's never wrong, even if x is only an int :) There may be better ways to do this. On old platforms some things WILL break. Do you really want to support x86? -- john skaller sk...@us... |
From: john s. <sk...@us...> - 2011-02-05 17:21:11
|
On 05/02/2011, at 7:21 PM, Bisht, Pradeep wrote: > finally i get to make my program work. there were some more 1L in JudyPrivate.h which needed to be changed to 1LL (the proper word size on 64-bit windows). You mean these? #define JU_BITPOSMASKB(BITNUM) (1L << ((BITNUM) % cJU_BITSPERSUBEXPB)) #define JU_BITPOSMASKL(BITNUM) (1L << ((BITNUM) % cJU_BITSPERSUBEXPL)) > Is there any regression suite that I can run make sure I have not broken anything? Also I would like to give back to community this code which now works on windows 64-bit - is there way I can do it. Thanks. Well, you don't have a portable solution here: you cannot use lLL. The right way to do this is: (Word_t)1u with a cast, that's portable. -- john skaller sk...@us... |
From: Bisht, P. <pra...@ya...> - 2011-02-06 06:05:47
|
Hello John, sorry I forgot to mention as i was talking about 64-bit windows only. I have these under #if defined JU_WIN && defined JU_64BIT so for the rest of the platforms it is unchanged - so it is portable. But definitely your way of (Word_t)1U is the much neater :-). Thank you. ________________________________ From: john skaller <sk...@us...> To: "Bisht, Pradeep" <pra...@ya...> Cc: jud...@li... Sent: Sat, February 5, 2011 9:20:56 AM Subject: Re: Judy on 64-bit windows; On 05/02/2011, at 7:21 PM, Bisht, Pradeep wrote: > finally i get to make my program work. there were some more 1L in JudyPrivate.h >which needed to be changed to 1LL (the proper word size on 64-bit windows). You mean these? #define JU_BITPOSMASKB(BITNUM) (1L << ((BITNUM) % cJU_BITSPERSUBEXPB)) #define JU_BITPOSMASKL(BITNUM) (1L << ((BITNUM) % cJU_BITSPERSUBEXPL)) > Is there any regression suite that I can run make sure I have not broken >anything? Also I would like to give back to community this code which now works >on windows 64-bit - is there way I can do it. Thanks. Well, you don't have a portable solution here: you cannot use lLL. The right way to do this is: (Word_t)1u with a cast, that's portable. -- john skaller sk...@us... |
From: Bisht, P. <pra...@ya...> - 2011-02-06 06:02:26
|
once I'm done with current testing I will ship you the changes I made. Thank you for responding. ________________________________ From: Doug Baskins <dou...@ya...> To: "Bisht, Pradeep" <pra...@ya...> Cc: jud...@li... Sent: Sat, February 5, 2011 6:46:15 AM Subject: Re: Judy on 64-bit windows; Pradeep: Thank you for your persistence on finding this problem. I do not have the resources (money to support Microsoft compilers) to get you support in a timely manner. I depend on the Open Source Community -- such as yourself. I retired in 2002. However, I am still very active on improved versions of Judy in my spare time. I did not sleep very well last night because I was thinking on how to help you and prevent a re-occurrence of this problem in the future. I am the author of most of Judy. As you (painfully) found out, C does not define the number of bits of its variables and constants precisely. I used my own typedef's in Judy to specify exactly how many bits were in the variables, but the constants (I believe) are still problematic. When Judy was written, Microsoft seemed to pick a "suspect" solution for 64 bit programs. Judy used: uint8_t, uint16_t, uint32_t, and uint64_t to specify the required variables with specific bit lengths. Later, the Unix and Linux community specified header files that did the same thing (and the same names). For variables where the bit sizes did not matter, I let the compilers decide and used "int" hoping the compiler would use the fastest size for the processor used. Today, processors are good about not caring about the size of variables affecting speed, so I suppose, we programmers should specify everything to be safer avoiding bugs and compile errors. If you know of a portable way to specify the bit length of constants, I would be very appreciated in learning how. I was very disappointed with many of the C compilers requiring the use of 1ULL and such without a way of specifying the number of bits in the constant or not needing to -- and they were DIFFERENT. I suspect that Microsoft changed their method, because Judy use to work. I tried very hard to get Judy to work on all OSes of either 32 or 64 Bit. I still do not know how to make the Build tools Compile and Install the 32 and 64 sizes on 64 Bit machines. I personally use 64 Bit Linux to test 32 and 64 Bit versions of Judy. I wish I could use 64 Bit Windows too. I believe 32 bit machines are a thing of the past because the price of memory has dropped to a point that makes 32 bit OS support a nuisance. Please send me your changes that you made to Judy to compile on 64 bit Windows. I would also very much appreciate a "tutorial" on how to compile Judy with Microsoft compilers that I could forward to people who are trying the same. I get a surprising number of emails requesting that information and I just have nothing worth while to suggest. I will take a look at your changes and test them on a 32/64 Bit Linux and a 64 Bit Mac machine with my regression tests. Sorry, I can afford Windows capability. Thank you for your Interest, Doug PS. I took me a lot of time to get Judy to compile and work on machines of Little and Big Endianness without #ifdef's -- something Microsoft does not do for Windows. Doug Baskins <dou...@ya...> ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Sat, February 5, 2011 1:21:23 AM Subject: Re: Judy on 64-bit windows; finally i get to make my program work. there were some more 1L in JudyPrivate.h which needed to be changed to 1LL (the proper word size on 64-bit windows). Is there any regression suite that I can run make sure I have not broken anything? Also I would like to give back to community this code which now works on windows 64-bit - is there way I can do it. Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 5:18:37 PM Subject: Re: Judy on 64-bit windows; oky so I have the fixed the crash it was due to (I think) use of things like <n>UL. It should have been the word size which on 64-bit windows is ULL. Now I can run my example but the number of indexes are very less than expected. I inserted 1000 unique indexes and that's what I see in 32-bit program but in 64-bit windows I see only 531 indexes. Can anybody please confirm if win-64 bit is supported? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 2:01:53 PM Subject: Re: Judy on 64-bit windows; I just noticed that Judy sourforge page says "operating systems: All 32-bit MS Windows (95/98/NT/2000/XP),". Looks like 64-bit is not supported. Is it correct? Thanks. ________________________________ From: "Bisht, Pradeep" <pra...@ya...> To: jud...@li... Sent: Fri, February 4, 2011 10:09:37 AM Subject: Judy on 64-bit windows; Hello, has any one been successful in compiling and using Judy on 64-bit windows. First I think there is a bug in Judy.h - typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. should have been #ifdef JU_WIN #ifdef JU_64BIT typedef uint64_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #else typedef uint32_t Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif #else // JU_WIN typedef unsigned long Word_t, * PWord_t; // expect 32-bit or 64-bit words. #endif Am I correct? Before making this change Judy1/LTaleGen.exe was failing in generating the tables and printing error ""BUG, in %sPopToWords, sizes not big enough for object\n". Now that I'm able to compile, it is crashing. my sample code is: void BuildJA () { Pvoid_t Parray = (Pvoid_t)NULL; // empty JudyL array Word_t lba, *Pvalue; // value for one index unsigned int i; for (i = 0; i < 1000; i++) { lba = i; // it crashes here for i = 1 JLI (Pvalue, Parray, lba); *Pvalue = 1; } } /* the main program */ int __cdecl main (int argc, char *argv[]) { BuildJA (); return (0); } crash happens at : static __inline int j__udySearchLeafW(Pjlw_t Pjlw, Word_t LeafPop1, Word_t Index) { SEARCHLEAFNATIVE(Word_t, Pjlw, LeafPop1, Index); } called from JudyLIns (). Am I doing something wrong here? Kindly note that on 32-bit windows I have done extensive testing several hours and several million entries without any problem. Thanks. |
From: john s. <sk...@us...> - 2011-02-06 10:20:24
|
On 06/02/2011, at 5:02 PM, Bisht, Pradeep wrote: > once I'm done with current testing I will ship you the changes I made. Thank you for responding. Ooops. Can you also fix the build system? At present we can't upgrade from the SF repository. Our clone looks like this: http://felix-lang.org:1116/$src/judy Basically we've modified Judy to get rid of table generation as part of the build, and remove the complex file sharing stuff. For example: http://felix-lang.org:1116/$src/judy/Judy1/Judy1Tables32.c If I recall this is generated by the Judy build system, in ours, we generate it and save it. The #include's are changed to take this into account. There's also the proper control of "dllexport" required for Windows. Basically our clone can be compiled "out of the box". Generating code in a build system is fraught with difficulties. Such as "using make" or "using bash" to do it. Such as finding out which compiler to use to actually build the tool that generates the code. Best not to get into this! [Felix does it, but we have spent several years developing portable technology to do it, written in Python] This is also what GNU tends to do. Although code generators are included in the repository, so is the code it generates. In particular, "automake" can be used to generate makefiles, but this is done by the original developer, not the client. You can get our version from GitHub as erickt/judy [I'll leave it to Erick to provide the details] I'd be happy to try to upgrade the SF repository if you want to stick with SVN at SF.. -- john skaller sk...@us... |
From: john s. <sk...@us...> - 2011-02-06 10:38:25
|
On 06/02/2011, at 9:20 PM, john skaller wrote: > > On 06/02/2011, at 5:02 PM, Bisht, Pradeep wrote: > >> once I'm done with current testing I will ship you the changes I made. Thank you for responding. > > Ooops. Can you also fix the build system? At present we can't upgrade from the SF repository. > Our clone looks like this: > > http://felix-lang.org:1116/$src/judy Er .. unfortunately SVN is down at SF at the moment so I can't check what the differences are. -- john skaller sk...@us... |
From: Geert De P. <ge...@de...> - 2011-02-06 11:22:29
|
One easy change we did to make the Windows build system better was to avoid the copies. And it is totally portable to UNIX or other platforms as well. For example instead of doing the following in the build cp JudyCommon/JudyFirst.c Judy1/Judy1First.c AND cp JudyCommon/JudyFirst.c JudyL/JudyLFirst.c We created a file Judy1/Judy1First.c Which contained #define JUDY1 #include "../JudyCommon/JudyFirst.c" The file JudtL/JudyLFirst.c would contain #define JUDYL #include "../JudyCommon/JudyFirst.c" The first #define (JUDY1 and JUDYL) is actually optional, but makes it easier for people to understand the flow of things. This definitely avoids duplicate code (which is what Felix seems to have) and strange copies of source code done by the build system -- Geert On 06 Feb 2011, at 11:38, john skaller wrote: On 06/02/2011, at 9:20 PM, john skaller wrote: > > On 06/02/2011, at 5:02 PM, Bisht, Pradeep wrote: > >> once I'm done with current testing I will ship you the changes I made. Thank you for responding. > > Ooops. Can you also fix the build system? At present we can't upgrade from the SF repository. > Our clone looks like this: > > http://felix-lang.org:1116/$src/judy Er .. unfortunately SVN is down at SF at the moment so I can't check what the differences are. -- john skaller sk...@us... ------------------------------------------------------------------------------ The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb _______________________________________________ Judy-devel mailing list Jud...@li... https://lists.sourceforge.net/lists/listinfo/judy-devel |
From: john s. <sk...@us...> - 2011-02-06 13:34:42
|
On 06/02/2011, at 9:55 PM, Geert De Peuter wrote: > One easy change we did to make the Windows build system better was to avoid the copies. > And it is totally portable to UNIX or other platforms as well. > > For example instead of doing the following in the build > cp JudyCommon/JudyFirst.c Judy1/Judy1First.c > AND > cp JudyCommon/JudyFirst.c JudyL/JudyLFirst.c > > We created a file > Judy1/Judy1First.c > Which contained > > #define JUDY1 > #include "../JudyCommon/JudyFirst.c" > > The file JudtL/JudyLFirst.c > would contain > #define JUDYL > #include "../JudyCommon/JudyFirst.c" > > The first #define (JUDY1 and JUDYL) is actually optional, but makes it easier for people to understand the flow of things. > > This definitely avoids duplicate code (which is what Felix seems to have) and strange copies of source code done by the build system Yes, we have similar thing but the code is duplicated, which is bad. But there's a reason .. :) I do not believe ".." is portable. Sibling references are bad. It used to be in the old days Microsoft treated "." as "current directory" which means you could not use that either. Not sure what they do now. The proper solution is to put JudyCommon on the command line as a search directory: -IJudyCommon etc. This avoids the problem with using ".." in filenames. Actually in Felix C/C++ libs I "ban" subdirectories. You're not allowed to #include "a/b". All #includes must be simple names in double quotes (except system headers). This means all files must be searched for under control of the command line. I actually spent a lot of time with several big libraries modifying all the filenames. [Someone decided it was cool to have a file called string.h and refused to change it] Anyhow, that's why the code is duplicated in the first instance. In the second instance, the code is actually different, and the style of using threaded code like this is bad: this is a C level hack. Better to have two distinct files without the macro, even if that means maintaining each lib separately because that's probably easier than getting the macros right. Also the macros are global, and pollute space: you say #define JUDYL immediately before a #include, but what if you have two #includes .. one macro would modify both files, perhaps accidentally. The best solution is probably to use C++ templates :) YMMV: but what we have here is several people modifying the original code, hence my comment: the build system is disgusting. I repeat though: Judy itself is very cool and the C interface, though hard to use, is exactly right. It's hard to use because C doesn't have a strong enough type system. The Word_t typedef is probably the best compromise. With a few developers we could reorganise the repository and the docs, and let Doug get on with the R&D. I also need a way to iterate JudyHS, its useless as it is because there's no way to get all the values, which will usually be pointers, so you can delete them (in C) or in my case, not delete them (using GC, I need to mark all reachable pointers). I also wonder: why does Judy only allow a single word value (for JudyL+SL+HS). Is that fundamental to the design? Or is it just too hard to do anything else in C? -- john skaller sk...@us... |
From: Erick T. <eri...@gm...> - 2011-02-06 19:57:37
|
On Sun, Feb 6, 2011 at 2:20 AM, john skaller <sk...@us...> wrote: > > Generating code in a build system is fraught with difficulties. > Such as "using make" or "using bash" to do it. Such as finding out > which compiler to use to actually build the tool that generates the > code. Best not to get into this! [Felix does it, but we have spent several > years developing portable technology to do it, written in Python] > > This is also what GNU tends to do. Although code generators are included > in the repository, so is the code it generates. In particular, "automake" can > be used to generate makefiles, but this is done by the original developer, > not the client. > > You can get our version from GitHub as erickt/judy [I'll leave it to Erick to > provide the details] Hey John, I have already partially modified felix's build system to compile judy straight from a tarball, so we won't have to deal with copying files around. I put it off until after we cut our next release since at the time it sounded like we were ding that soon. I'll see if I can finish it up over the next couple days. Judy folks, I'm not sure if you'd be interested, but I'd be happy to take my changes and make an independent build system using our fbuild (http://github.com/erickt/fbuild) if anyone's interested. |
From: john s. <sk...@us...> - 2011-02-07 01:18:09
|
On 07/02/2011, at 6:47 AM, Geert De Peuter wrote: > ".." has been portable on all OS's we compile on. That's any UNIX variant, OS/2 (where are the days), OS400 and Windows (since it's earliest versions). Are you sure? Because in Unix "." means "relative to this file", whereas on early MSC compilers it meant what it said: "relative to current directory". It follows by "logic" (if that applies to anything!) that ".." means "parent dir of current file" and "parent of current directory" respectively. With "make" these are often the same because you cd into the build directory. Felix build *disallows* changing the current directory or depending on it (except once at startup to determine the repository location, and that can be overridden). We mandate explicit paths for all input and output files. Eg you are not supposed to write: gcc -c x.c you have to put gcc -c x.c -o x.o > I agree having the JudyCommon in the library search path indeed doesn't require this ".." reference. This is also a minimal change. > > The good thing about the approach we took is that we didn't change the original distribution. We just wrote our own makefiles on top and added a few files with "#include" directives. > Builds are simply not allowed to do a copy, that's a strict policy we follow. We don't do a copy in the build, the copy is already in the repository. > Reason is simple - the source directory may be shared (remotely mounted) on multiple machines that will do platform specific compilations. You don't want to have bizarre race conditions in a build system when multiple systems use the same source and while one is reading the file - another system decides to run a copy job. Actually, our system "supposedly" disallows building in the repository image, which is assumed to be read only. So we actually copy most of the source to a specific directory: build/release, build/debug, or whatever. That eliminates the above problem. It's all "supposed" to work without copying though, but not all tools can handle this. Our system is a cross-cross-compiler so the kinds of things we need to do are quite nasty :) Felix -> C++ -> binary -> execute So actually we "should be" managing multiple configuration data, eg you might build Windows 64 bit code on a Linux 32 bit machine. [I'm not sure all that still works, but it was designed to support Cygwin and also Windows MSVC 32->64, 32->32 and 64->64 bit compilers, which is a LOT of hassles! I don't run Windows any more so I can't test it] > Last I wanted to mention that the #define JUDY1 is optional ... we added it because it makes it easy for the reader - and there is no problem doing that. > It would just end up being one more directive to add to the makefile if you didn't like that approach. Actually we refuse to use "make". It's a junk tool and also not portable. We decided building was a sophisticated process that demanded a proper language, we chose Python, Erick wrote "fbuild" which is based on the idea that building is a sequence of imperative operations, it uses caching to avoid repeating something already done (i.e. it's an optimisation). But of course we can just ignore makefiles :) Anyhow, it seems we agree on basic build principles, and the main thing is that it builds. The one real problem issue here is our macro header: #ifndef JUDY_EXTERN #if defined(_WIN32) && !defined(FLX_STATIC_LINK) #ifdef BUILD_JUDY #define JUDY_EXTERN __declspec(dllexport) #else #define JUDY_EXTERN __declspec(dllimport) #endif #else #define JUDY_EXTERN #endif #endif Our system here is: All externals in the library have to be marked "JUDY_EXTERN". For example: FUNCTION int JUDY_EXTERN Judy1ByCount The macros above, which we normal put in a #include "judy_config.hpp" file, but not in the case of Judy, are required to decide what _declspec to emit: dllexport or dllimport. This requires a command line switch (there's no workaround for this), we use two: BUILD_JUDY: --> we're building the library rather than using it FLX_STATIC_LINK: -> we're making a static lib, rather than a DLL Together with _WIN32 switch from the compiler this decides whether to use dllimport, dllexport or nothing. There are some variations on this of course but there is no way to avoid modifying the code with JUDY_EXTERN or equivalent. To avoid hassles with shells, it is best MACROs specified on command line are "defined or not defined" rather than have a value, which means the JUDY_EXTERN macro has to be defined in the source code. We build all our C/C++ code with Python script that "always gets this right" for all libraries, but it only works because the protocol is fixed. Clearly Judy does not want to use "FLX_STATIC_LINK" macro. I'd be happy with anything which works of course :) -- john skaller sk...@us... |
From: john s. <sk...@us...> - 2011-02-07 01:27:12
|
On 07/02/2011, at 6:57 AM, Erick Tryzelaar wrote: > On Sun, Feb 6, 2011 at 2:20 AM, john skaller > <sk...@us...> wrote: >> >> Generating code in a build system is fraught with difficulties. >> Such as "using make" or "using bash" to do it. Such as finding out >> which compiler to use to actually build the tool that generates the >> code. Best not to get into this! [Felix does it, but we have spent several >> years developing portable technology to do it, written in Python] >> >> This is also what GNU tends to do. Although code generators are included >> in the repository, so is the code it generates. In particular, "automake" can >> be used to generate makefiles, but this is done by the original developer, >> not the client. >> >> You can get our version from GitHub as erickt/judy [I'll leave it to Erick to >> provide the details] > > Hey John, > > I have already partially modified felix's build system to compile judy > straight from a tarball, so we won't have to deal with copying files > around. I'm not sure that's possible or desirable. Judy uses generated C code, namely lookup tables. There's no reason for this, there are exactly two variants: 32 and 64 bit, there's no reason that Doug shouldn't generate them himself and put the result in the repository. What I'd like to see is that some other non-core-Judy-developers (i.e. not Doug or Alan) take care of the build and docs some other peripheral stuff so Doug can focus on R&D. Especially as we need Windows people for testing. Worse, sad, I have OSX and it seems there will be no choice but to switch to CLANG/LLVM since GNU stupidly screwed Apple up by introducing GPL 3, so it seems Apple will no longer support gcc, and unfortunately their OSX doesn't work with standard Unix loaders (Xcode gcc is patched by Apple). -- john skaller sk...@us... |
From: Doug B. <dou...@ya...> - 2011-02-07 18:49:33
|
To All: Thank you very much to all of you for you comments and suggestions. I find the idea of a good build system very appealing. Especially for 2 reasons; 1) It is simply not my forte' , 2) I think the person that has been doing that job has moved out of the HP Lab and I suspect he would like to pass the baton -- HP does has issues with Version 3 LGPL. I do not understand much of your talk about building methods. I live on different planet. I think many of your comments were right-on! I will incorporate almost all of them in the next Judy. But now for the exciting part. I decided to take a break from Judy1/L for awhile and work on JudyHS, since that has been neglected for so long. In the design to add JudyHSNext/Prev/First/Last very interesting possibilites jumped off the page. I need your help with suggested features that look very feasible. 1) An Index of arbitrary size in Bytes, Words, specifiable? 2) An arbitrary length of Value in Bytes, Words? 3) A new Create method: JudyXXCreate(), would be necessary before the first JudyXXIns(). 4) Instead of returning a pointer to Value, the pointer would be to a struct -- containing the Value(s) and possibly the length of the Value area. This means that every Index could possibly have a different size Value area. This leaves the possibilty of a 0 Value area, with improved speed than with non-zero to a max of ?? Bits/Bytes/Words? 4) The length of Value area would have to be specified in Create (static) or Insert (dynamic)? 5) The sort method (Dictionary(lexicographical order) or Binary) would have to be specified during JudyXXCreate() 6) Endianess compensation parameter at XXCreate() if the Index is passed in array elements of 1,2,4,8 byte elements. I.E. suppose the Index is 10 bytes, but passed as "uint32_t Indx[20];" , and passed like "JudyXXIns(PArray, Indx, 10)", or "JudyXXIns(PArray, Indx, 4, 10);" The byte stream looks different with Little and Big Endian machine. Judy needs to know how many bytes to swap. The Values would stay native Endianess. Also if the Index is Left or Right justified? 7) The Sematics of XXIns() would need to know what to do if the size of Value area is different that a one that already exist. Possibilities are stay same or grow? 8) Error returns from passive (non-array altering) calls would return the same as a 0 population array (no error codes -- no passed error parameter) 9) Leading zero deletion when in Binary sort mode -- perhaps specifiable at XXCreate() 10) I think the speed would be similar to JudyL + one Cache-fill for Value areas less that a cache-line in size (64bytes). 11) In the case of 0 length Value area, leave the return pointer to what? 12) I am sure I have I forgotten something. doug Doug Baskins <dou...@ya...> ________________________________ From: john skaller <sk...@us...> To: Erick Tryzelaar <eri...@gm...> Cc: Doug Baskins <dou...@ya...>; judy <jud...@li...> Sent: Sun, February 6, 2011 6:26:33 PM Subject: Re: Judy on 64-bit windows; On 07/02/2011, at 6:57 AM, Erick Tryzelaar wrote: > On Sun, Feb 6, 2011 at 2:20 AM, john skaller > <sk...@us...> wrote: >> >> Generating code in a build system is fraught with difficulties. >> Such as "using make" or "using bash" to do it. Such as finding out >> which compiler to use to actually build the tool that generates the >> code. Best not to get into this! [Felix does it, but we have spent several >> years developing portable technology to do it, written in Python] >> >> This is also what GNU tends to do. Although code generators are included >> in the repository, so is the code it generates. In particular, "automake" can >> be used to generate makefiles, but this is done by the original developer, >> not the client. >> >> You can get our version from GitHub as erickt/judy [I'll leave it to Erick to >> provide the details] > > Hey John, > > I have already partially modified felix's build system to compile judy > straight from a tarball, so we won't have to deal with copying files > around. I'm not sure that's possible or desirable. Judy uses generated C code, namely lookup tables. There's no reason for this, there are exactly two variants: 32 and 64 bit, there's no reason that Doug shouldn't generate them himself and put the result in the repository. What I'd like to see is that some other non-core-Judy-developers (i.e. not Doug or Alan) take care of the build and docs some other peripheral stuff so Doug can focus on R&D. Especially as we need Windows people for testing. Worse, sad, I have OSX and it seems there will be no choice but to switch to CLANG/LLVM since GNU stupidly screwed Apple up by introducing GPL 3, so it seems Apple will no longer support gcc, and unfortunately their OSX doesn't work with standard Unix loaders (Xcodegcc is patched by Apple). -- john skaller sk...@us... ------------------------------------------------------------------------------ The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb _______________________________________________ Judy-devel mailing list Jud...@li... https://lists.sourceforge.net/lists/listinfo/judy-devel |
From: john s. <sk...@us...> - 2011-02-07 23:15:57
|
On 08/02/2011, at 5:49 AM, Doug Baskins wrote: > To All: > > Thank you very much to all of you for you comments and suggestions. I find the > idea of a good build system very appealing. Especially for 2 reasons; 1) It is > simply not my forte' , 2) I think the person that has been doing that job has > moved out of the HP Lab and I suspect he would like to pass the baton -- > HP does has issues with Version 3 LGPL. I do not understand much of your > talk about building methods. I live on different planet. My concept is that build systems are just programs like any other. This means they need good structure. And they need to be portable. This is hard since everyone's computer is different, and even harder because some people develop code on platform X but run it on platform Y. The key thing here is for you to decide how to let others help. Small patches to code can easily be handled by .. small patches. Reorganising documentation and systematic code editing would be easier with repository access. SF+SVN is not a good tool for collaboration, it depends too much on trust. Git is a better tool. It's the tool developed by Linus to manage development of the Linux kernel. Mercurial is a similar tool, it's used by Google. Doug: you and Alan have to decide what to do here. If you use Git you don't have to take any risks giving people write access to the repository, but the downside is you will have to review and accept all the changes we make actively. Ideally, it would be better if someone is appointed repository manager and they set up the system for you and tell YOU what you have to do :) Erick may be happy to do that, not sure, certainly there's no problem using the already built Felix version as base. > > 1) An Index of arbitrary size in Bytes, Words, specifiable? Doesn't JudyHS already have that? .. An no, I see you mean like JudySL, each key can be a different length. > 2) An arbitrary length of Value in Bytes, Words? There is a point where the cost of using a pointer to the heap is insignificant. Also it depends if the value slot (address) is persistent or not. If the value slots can move around, many objects, say, C++ objects, have to be put on the heap, unless you want to upgrade Judy to using C++ templates so the constructors/destructors can handle moving the values around. in C++ speak it is safe to "memcpy" something if it is a POD = Plain Old Data type, which is C++ speak for "old fashioned C data structure" :) On bytes/words: whatever is easier for you to implement. There are alignment issues here: a 4 byte object may have to have an address which is a multiple of 4. > 4) Instead of returning a pointer to Value, the pointer would be to a struct -- containing > the Value(s) and possibly the length of the Value area. This means that every Index > could possibly have a different size Value area. This leaves the possibilty of a 0 Value > area, with improved speed than with non-zero to a max of ?? Bits/Bytes/Words? > 4) The length of Value area would have to be specified in Create (static) or Insert (dynamic)? Typically, string keys can be variable length. Values are usually either static length, or they're a Union (with a discriminant). Ideally a C union value would only store the actual component, rather than allocate the maximum size, however this is VERY HARD to do right, because of alignment issues. It can't be done at all in either C99 or C++90 (without external configuration data). > 5) The sort method (Dictionary(lexicographical order) or Binary) would have to be specified > during JudyXXCreate() This is getting harder. If you have a comparator function make sure it has an extra void* parameter for client data: int compare (void *client_data, void *x, void *y) to allow "generic" routines to be written. > 6) Endianess compensation parameter at XXCreate() if the Index is passed in array elements of > 1,2,4,8 byte elements. > I.E. suppose the Index is 10 bytes, but passed as "uint32_t Indx[20];" , and passed like > "JudyXXIns(PArray, Indx, 10)", or "JudyXXIns(PArray, Indx, 4, 10);" The byte stream looks > different with Little and Big Endian machine. Judy needs to know how many bytes to swap. > The Values would stay native Endianess. Also if the Index is Left or Right justified? One way around this is NOT to do it at all. You can specify the order, and leave it up to the client to meet your requirements. There is a well known canonical ordering: Internet Byte order (which is little endian I believe). There are also various encodings. One popular one is 7 bits per byte with the high bit set to zero on all bytes except the last one. This is just like me saying: for JudyHS/JudySL I'm annoyed at not allowing 0 in the byte stream. But the fact is I can convert any string with 0 in it to UTF-8 to get rid of the 0. IMHO: you should pick the encoding that is easiest to implement and make the client comply. If you start trying to guess all the possibilities here and handling them all in Judy, you will get continues bugs and complaints that you're not handling X Y and Z :) EG: the key is a struct .. woops, structs can have PADDING. Now it is quite hard to specify the "byte order" because now you have to leave some bytes out. > 7) The Sematics of XXIns() would need to know what to do if the size of Value area is different that > a one that already exist. Possibilities are stay same or grow? Ouch. Interesting problem. > 8) Error returns from passive (non-array altering) calls would return the same as a 0 population array > (no error codes -- no passed error parameter) > 9) Leading zero deletion when in Binary sort mode -- perhaps specifiable at XXCreate() > 10) I think the speed would be similar to JudyL + one Cache-fill for Value areas less that > a cache-line in size (64bytes). 64 bytes is a reasonable maximum for a value. After that go back to 8 byte pointer to heap. For keys it is probably different. > 11) In the case of 0 length Value area, leave the return pointer to what? NULL. > 12) I am sure I have I forgotten something. Of course, you were so involved in writing this email you forgot to put the coffee on and the garbage out .. the cat is dying of hunger and the dog ran off with your wife :) -- john skaller sk...@us... |
From: Alan S. <aj...@fr...> - 2011-02-07 23:31:21
|
John et al, > My concept is that build systems are just programs like any other. But not really just like any other because of what they do. You might not get that impression from the Judy makefile I mostly wrote, but I am something of an expert on software build and delivery systems. I won't expound at length but want to mention that the idea of a data flow diagram (DFD) is a powerful model for what a build system is really all about, what it implements. Repositories (files) flow through processes (programs/etc) into other files. (DFDs are to control flow diagrams as higher-level declarative languages are to lower-level functional languages. You can google and read a lot more about DFD theory if it's a new idea to you.) When the build sources/targets are small enough relative to current technology, we just rebuild everything from scratch all the time and the build system can be relatively simple. In real life though we always need conditional (re)build systems that understand dependencies and how to do efficient partial rebuilds. Part of the art is how to correctly and efficiently "templatize" myriad repetitive patterns (rules) that have some variations. The most complicated build system I've ever seen was for chip design flows, where multiple different sets of sources could be used to create multiple different sets of outputs depending on what sources were available, and there were series of pattern-matching distinguishers for when to do what within each "rule group". Anyway one other drive-by concept worth mentioning is that real life often involves both multi-target rules and multi-rule targets (using make(1) terminology)... The correct handling of those concepts is philosophically difficult, and often gotten wrong. > Doug: you and Alan have to decide what to do here. To be clear it's 100% Doug's project now, has been for years. I often respond to emails trying to be helpful if he doesn't, and we live in the same general area, but haven't worked on libJudy together since 2002. He's retired, I'm mostly not yet, and he's the sole owner of the library, I'm just a user now. Which means I'm not reading every word of the current discussion as it gets involved! :-) Cheers, Alan Silverstein |
From: john s. <sk...@us...> - 2011-02-08 00:39:05
|
On 08/02/2011, at 10:31 AM, Alan Silverstein wrote: > John et al, > >> My concept is that build systems are just programs like any other. > > But not really just like any other because of what they do. You might > not get that impression from the Judy makefile I mostly wrote, but I am > something of an expert on software build and delivery systems. I won't > expound at length but want to mention that the idea of a data flow > diagram (DFD) is a powerful model for what a build system is really all > about, what it implements. Repositories (files) flow through processes > (programs/etc) into other files. (DFDs are to control flow diagrams as > higher-level declarative languages are to lower-level functional > languages. You can google and read a lot more about DFD theory if it's > a new idea to you.) Data flow is not a new idea, it's a subset of the REAL idea: category theory. I have a degree in math specialising in abstract algebra and my particular interest is category theory :) However, the domain specific languages used for build systems are crap. I mean they're utterly crap. They're all wrong. All build systems I know (other than fbuild) totally get the idea backwards. Building is NOT goal driven. Make gets this completely wrong and everyone has copied the mistake, and the mistake is fundamental. The idea of targets is WRONG. Build systems must be driven bottom up. They're intrinsically imperative NOT functional. When you change a source file, that should trigger rebuilding the system based on what depends on the source file you changed. This is completely the reverse of target driven building. The correct way to construct a build system is by specifying how to build the system and then optimising it. You CANNOT specify goal driven building effectively, because it is not possible to get the dependencies right. This is a plain fact of reality. Goal driven building also fails to work with multiple outputs. Many programs output several files, eg make some binary code AND generate documentation. Also some systems require recursion. The best example is LaTeX. This requires a concept of fixpoints, that is, you run latex repeatedly until the output doesn't change (this is because things like plugging in cross references change layout, which change the cross-references). The canonical example of how goal driven building fails is my own product "interscript". This is a literate programming tool that takes a file containing other files and emits them (and maybe typesets stuff as well). It is completely backwards to specify the "targets" here. You have no idea what they are. Interscript just generates files and it can use arbitrary Python code to do that. The code generation is sophisticated. But you don't care. What matters is that when the interscript INPUT file changes you have to run interscript again to generate the outputs, whatever they are. > > When the build sources/targets are small enough relative to current > technology, we just rebuild everything from scratch all the time and the > build system can be relatively simple. In real life though we always > need conditional (re)build systems that understand dependencies and how > to do efficient partial rebuilds. Part of the art is how to correctly > and efficiently "templatize" myriad repetitive patterns (rules) that > have some variations. Yes. And the way to do that sophisticated stuff requires a REAL programming language like Python. Trying to do this with micky mouse crap like Make cannot possibly work. That is why Make comes with a set of other rubbish tools like automake, autoconf, and you have crud all over the place *.am and inputs and other rubbish .. almost all of which is C specific. > > The most complicated build system I've ever seen was for chip design > flows, where multiple different sets of sources could be used to create > multiple different sets of outputs depending on what sources were > available, and there were series of pattern-matching distinguishers for > when to do what within each "rule group". You had better look at fbuild then :) Fbuild is a caching build system. It caches the results of various operations (compiling stuff, etc) and knows when the caches are not up to date. So rebuilding is the same as building, except the caching allow skipping some parts of the build because the dependencies tell fbuild the results will be the same. Fbuild captures dependencies automatically, you not only don't have to specify them .. you CANNOT specify them. What you do is say something like: cc("mycode.c","myout.o") and fbuild caches the function call, it knows that "mycode.c" is an input and "myout.o" is an output (because the 'cc' function has been written that way, it's not magic!). So you see with fbuild, you basically just tell it how to build the system, the optimisation is automatic. > Anyway one other drive-by concept worth mentioning is that real life > often involves both multi-target rules and multi-rule targets (using > make(1) terminology)... The correct handling of those concepts is > philosophically difficult, and often gotten wrong. Yes. Which is why you should not use crud like make. You have to use a powerful expressive language or you can't possibly hope to get it all right. It is still hard, even with Python. fbuild still has bugs. And the Felix build scripts do as well. But the build system really is portable. Code builds the same way using MSVC on Windows as it does using gcc on Linux or OSX. > >> Doug: you and Alan have to decide what to do here. > > To be clear it's 100% Doug's project now, has been for years. I often > respond to emails trying to be helpful if he doesn't, and we live in the > same general area, but haven't worked on libJudy together since 2002. > He's retired, I'm mostly not yet, and he's the sole owner of the > library, I'm just a user now. Which means I'm not reading every word of > the current discussion as it gets involved! :-) you may not be the owner, but you have a level of interest and expertise. The "rule of 3" says that to get a system like Judy working requires a lot of resources, including intellectual ones: we need your brain :) -- john skaller sk...@us... |
From: Alan S. <aj...@fr...> - 2011-02-08 02:20:19
|
Drifting off topic but what the heck... > Data flow is not a new idea, it's a subset of the REAL idea: category > theory. OK, I'll have to read up on that... > Build systems must be driven bottom up. They're intrinsically > imperative NOT functional. Would you elaborate on the difference? Do you mean the difference between declarative (what) and functional (how)? > When you change a source file, that should trigger rebuilding the > system based on what depends on the source file you changed. This is > completely the reverse of target driven building. Yes -- and no -- at least as I think of it. Viewing a build system as an acyclical graph, it's a static (at any one point in time) set of relationships between sources (files that have no arrows into them within the build system, even if derived say from a version control system) and constructed targets (some of which are deliverable, others of which are intermediate, but that doesn't matter here). Given some form of specification of these relationships -- sources, targets, rules, dependencies/conditions -- then any time a source changes, all dependees must be at least revisited if NOT updated/reconstructed, whether you consider this to be targets-backwards or sources-forward. By the way, that elaborate chip design system I mentioned had a neat feature, where you could say "check to see if the target actually changed as a result of the reapplication of the rule" and if not, don't touch it, don't even change its modify time, meaning all downstream targets (dependees of it) don't need rebuilding. This "pruning" saved considerable time in many real circumstances where a target was in some way an abstraction of a source, immune to many detailed changes affecting the source but not the target. > You CANNOT specify goal driven building effectively, because it is not > possible to get the dependencies right. This is a plain fact of > reality. Can you please elaborate on that? Again if I imagine the DFD describing a collection of source and constructed files, and their rules and dependencies, it doesn't seem to matter much which way you look at the arrows, it's the results that count. > Goal driven building also fails to work with multiple outputs. Many > programs output several files, eg make some binary code AND generate > documentation. Right, this is what I summarize as multi-target rules. A common problem is deciding whether all of the targets need updating when a common source changes. The pruning I mentioned earlier helped control ripples in this way. An even worse problem, usually not well understood, is a multi-rule target. This is when several rules contribute to a single repository (such as a message catalog), blurring the state of that target for its dependees. I further divide these into robust and fragile multi-rule targets. A robust one can be partially updated correctly at any time (like revising some database entries), but a fragile multi-rule target must be wholly rebuilt (running multiple input rules) when any dependency demands it. In the worst case there's an ordering requirement upon the rules (the file must be built in the right order) which is difficult to correctly represent in a "static" DFD. Wise designers avoid creating constructing files that are fragile multi-rule targets, if at all possible. In real life one way this manifested, for example, was shipping a bad patch, where a message catalog was broken due to an incomplete rebuild, but the entire file was redelivered. I think multi-rule targets arise naturally but mistakenly from old-school thinking where files and file systems were expensive, so we lumped similar things into common files (a kind of not-really database), sometimes with an associated "registry" (index) of some type. I'm more in favor of what I call "self-registry", like how /etc/rc.d works (if I recall right). You drop files/scripts into a "known location" and their mere presence (when found) acts as the registry, plus you can easily update every file separately from others. > Also some systems require recursion. The best example is LaTeX. This > requires a concept of fixpoints, that is, you run latex repeatedly > until the output doesn't change (this is because things like plugging > in cross references change layout, which change the cross-references). Yuck. Cyclical build graphs are anathema and should be completely avoided because no one ever builds them correctly. I dispute your assertion that "some systems require recursion." Good design should avoid it. When "magic happens here" is a design rule, miraculous bugs follow. > The canonical example of how goal driven building fails is my own > product "interscript". This is a literate programming tool that takes > a file containing other files and emits them (and maybe typesets stuff > as well). > > It is completely backwards to specify the "targets" here. You have no > idea what they are. Interscript just generates files and it can use > arbitrary Python code to do that. The code generation is > sophisticated. I would assert that you have a design flaw in your package. Correct building demands "full disclosure" to the build control system, in whatever language. All files must be listed; hidden temporary or intermediate files not explicitly stated are accidents waiting to happen. Your example of (presumably) unpredictable deliverable targets is even worse. It might be expedient for the programmer to just "write the list as a smart rule," but I think it's bad design. It makes it impossible to "manifest" the customer deliverable package in a predictable and auditable way. (I have a lot of experience dealing with CPE = current product engineering...) I understand WHY programmers like to operate this way. It's clunky to have to "redundantly" state information to various parts of the engineering system. First I tell the version control system I just created a new source file... Now I must tell the build system about that file and how to build it... Later I must explain how to handle (at least clean up) any intermediate files like *.o object files... And then tell some kind of package/delivery/update code (often separate parts flying in formation) once or more about the target deliverable files. And lord help me if I forget, leaving some kind of disconnect in the DFD, and lack adequate automated tests/tools to catch my error. Yuck! An elegant comprehensive environment would make that easier, more integrated. So being a clever programmer, hell I'll just write a script/program that embodies some arcane app-specific knowledge about how to create targets from sources, based on "discovery"... Believe me I've see all kinds of half-assed (well-intended but still hackish) packages put together around these kinds of issues, with no overall understanding of what it means to deliver maintainable, updateable, removable packages to customers. I don't think the answer is to punt and say, "my targets are auto-generated." A better answer is, "I have an easy way to specify exactly what I'm expecting within and as output from the build system, and to check that I got what I expected." This does not mean you must list every *.o to be created... I'm OK with generic rules for generic circumstances... But that rule must only be applied in generic situations. > Yes. And the way to do that sophisticated stuff requires a REAL > programming language like Python. Trying to do this with micky mouse > crap like Make cannot possibly work. Uh, you dismiss it too quickly. Obviously make is popular because in many relatively simple contexts it works just fine -- warts and all. (Although I agree philosophically that it's a far cry from a comprehensive version control, build, test, package, deliver, and update/remove solution.) > Fbuild is a caching build system. It caches the results of various > operations (compiling stuff, etc) and knows when the caches are not up > to date. So rebuilding is the same as building, except the caching > allow skipping some parts of the build because the dependencies tell > fbuild the results will be the same. Cool, that's the right concept. > Fbuild captures dependencies automatically, you not only don't have to > specify them .. you CANNOT specify them. Caution, you appear to be headed down the same path as (now what was the name again of Rational Software's kernel-incestuous over-the-top version control and build package?) You couldn't swat a fly in that system without first getting a doctoral thesis! > So you see with fbuild, you basically just tell it how to build the > system, the optimisation is automatic. Exceptions prove the rule, and wreck the budget. -- Miller How do you let people specify unusual dependencies that aren't as simple as compile this-to-that? Cheers, Alan |
From: john s. <sk...@us...> - 2011-02-08 04:26:44
|
On 08/02/2011, at 1:20 PM, Alan Silverstein wrote: > Drifting off topic but what the heck... Lol :) Its not off topic because we're proposing to rework the Judy build system :) BTW: I'm not proposing to use something wizz bang for this, Make should do just fine: Judy is basically just a bunch of C files that need to be compiled. >> Data flow is not a new idea, it's a subset of the REAL idea: category >> theory. > > OK, I'll have to read up on that... Category Theory (CT) is *the* theory of abstraction. It basically starts off by considering sets and functions, in programming we call these types and functions, in category theory we call them objects and arrows. The key idea is to abstract away the elements of the sets and talk about "structure" entirely in terms of the properties of the functions. For example we can explain "The function f:X->Y is 1-1", which is a set element style definition, in a categorical setting like this: for all g1, g2: U -> X, g1 .f = g2.f implies g1 = g2 This is easy to understand. Suppose f wasn't 1-1. Then you might have g1(x)=a and g2(x)=b, but f(a)=y and f(b)=y, so g1 and g2 can be different functions, but you can't tell because f "removes the different outputs by mapping them to the same value". This can't happen if f is 1-1, if g1 and g2 are different there is some value x for which g1(x) != g2 (x), and f must map these to two distinct values, so g1 . f != g2 .f. now the point here is to examine the formula again: we have defined "f is 1-1" without mentioning elements. In other words, the definition is *abstract*: written entirely in terms of functions, ignoring the sets and their values. In fact, we can throw out the sets entirely, replace X with the function identity: X -> X Anyhow, the relation to "data flow" is clear: the arrows are "channels down which data flows, possibly being modified along the way" :) And the key properties can be understood in the abstract without caring about the actual data types being processed. > >> Build systems must be driven bottom up. They're intrinsically >> imperative NOT functional. > > Would you elaborate on the difference? Do you mean the difference > between declarative (what) and functional (how)? No, I mean build systems aren't declarative, quite the opposite. They action based. Imperative, procedural, whatever. Functional/declarative models are all wrong. If you look at make, the *rules* are imperative: do this, do that. The declarative part .. the goal dependency stuff, doesn't work. Just for starters, many programs don't have a single output. This screws the basic concept up immediately. Some programs have side effects (no outputs as such) and some have multiple outputs. So you immediately have to add hacks to make, phonies and proxy files, just to make it work at all .. and thats doing really basic stuff. > >> When you change a source file, that should trigger rebuilding the >> system based on what depends on the source file you changed. This is >> completely the reverse of target driven building. > > Yes -- and no -- at least as I think of it. Viewing a build system as > an acyclical graph, it's a static (at any one point in time) we agree this is a gross oversimplification .. we will temporarily accept this to avoid confusion, but clearly it isn't so. Consider say "doxygen" .. surely, there are a fixed set of inputs, but who knows what the *.html files it generates are??? > set of > relationships between sources (files that have no arrows into them > within the build system, even if derived say from a version control > system) and constructed targets (some of which are deliverable, others > of which are intermediate, but that doesn't matter here). Given some > form of specification of these relationships -- sources, targets, rules, > dependencies/conditions -- then any time a source changes, all dependees > must be at least revisited if NOT updated/reconstructed, whether you > consider this to be targets-backwards or sources-forward. Yes .. given this information. The problem is how to get it. Specifying dependencies is intrinsically unreliable and entirely unnecessary, provided you can capture outputs. This is hard to understand but true. Consider a set of programs (compilers, document generators, linkers, etc etc). And some source files. Lets assume we know which programs to apply to which files. What order should we apply then programs in? When do we need to apply them? The answer is not what you'd expect. You can apply the programs IN ANY ORDER!! Surprised? Its true! It doesn't make any difference. Here is the build algorithm: Apply programs to files in any order. Step2: Do it again. If the results are the same, you're done. Otherwise repeat from step 2. This is the fixpoint algorithm: just keep trying until the build is stable. To make this work you need to be able to monitor the *outputs* of programs: you need to see every file that is touched or created, so you can check when the build is completed (reached a fix point). Well now, you can say "But that is horribly inefficient!!!" Yes yes, you'd be right. So optimise it. Specify dependency information AS A HINT. Now perhaps you see the point. This system does NOT require dependency information to work. Only to work efficiently. in particular it doesn't require all the dependency information and it still works if the dependency information is wrong. So here, the fundamentals are: (a) a set of build steps (b) output monitoring The "dependencies" are relevant only for optimisation. That's not unimportant!! But the point is, the system, at the core, is driven bottom up, and consists of a set of *actions*: there's nothing declarative in the core. The dependency relations are useful for performance, they're not of any conceptual significance. Certainly dependencies *exist*. Certainly there is workflow. But it isn't necessary to specify it, nor to even get it right! Interscript **literally** works this way! It repeatedly does actions until nothing changes (or a limit is reached, usually 2 passes is the limit :) > > By the way, that elaborate chip design system I mentioned had a neat > feature, where you could say "check to see if the target actually > changed as a result of the reapplication of the rule" and if not, don't > touch it, don't even change its modify time, meaning all downstream > targets (dependees of it) don't need rebuilding. Interscript does this automatically for all outputs, for that reason. Interscript itself doesn't care, since it compares the contents of files. But to do that, saves every output to a temporary first, so it is easy to just abandon the temporary if there is no change. [Actually it is more efficient to read/compare until there's a difference, then switch to write mode .. but I didn't implement that] >> You CANNOT specify goal driven building effectively, because it is not >> possible to get the dependencies right. This is a plain fact of >> reality. > > Can you please elaborate on that? Sure: tell me the names of all the files generated by a run of doxygen. [Doxygen is a C++ code documentation system] You can't. It makes them up using some hidden formula, they're just a set of web pages, all it cares about is that file1 correctly references file 2. All you know is the name of the "index page". but you cannot do a build depending on the output of doxygen by just examining the index page because it is likely to be unchanged even when the pages describing your functions have major changes in them. > Again if I imagine the DFD describing > a collection of source and constructed files, and their rules and > dependencies, it doesn't seem to matter much which way you look at the > arrows, it's the results that count. Yes, but the problem is you cannot specify the graph: its too hard. And it isn't necessary. > An even worse problem, usually not well understood, is a multi-rule > target. This is when several rules contribute to a single repository > (such as a message catalog), blurring the state of that target for its > dependees. Basically, in set theory and category theory we have things called products. Aka "Cartesian product" or "tuple" or even "struct" in C. What you're saying is that handling NORMAL arrows from products to products is hard: a,b,c -> d,e,f and that's my point. This is trivial basic stuff. Part of the problem is that people **incorrectly** think that a graph is like this: file -- action --> file This is completely wrong! Its the other way around! The actions are not arrows, they're the points! The resulting data structure is called a Multi-Graph. --- x.c -->[ C compiler ] --> x.o Note carefully: the arrows are files. The C compiler is a point. (black box, chip, whatever you want). I have a whole book on this, but its hard to get (by RFC Walters). > I further divide these into robust and fragile multi-rule > targets. A robust one can be partially updated correctly at any time > (like revising some database entries), but a fragile multi-rule target > must be wholly rebuilt (running multiple input rules) when any > dependency demands it. In the worst case there's an ordering > requirement upon the rules (the file must be built in the right order) > which is difficult to correctly represent in a "static" DFD. Wise > designers avoid creating constructing files that are fragile multi-rule > targets, if at all possible. Yes, that sounds interesting. Basically some products can be built one component at a time and some are built "all at once". > I think multi-rule targets arise naturally but mistakenly from > old-school thinking where files and file systems were expensive, Yes. The whole Unix FSH (File system tree) is archaic. The idea of putting all the *.h files in one directory and the *.o files (or *.a files) in another is absurd. But it was done in the old days for performance. No modern systems use this. I think Sun pioneered the right way: one directory for each product (in "opt"). Apple has these, calls them frameworks. On unix systems we usually pervert /usr/local/lib. After all not all software is even C ! > so we > lumped similar things into common files (a kind of not-really database), > sometimes with an associated "registry" (index) of some type. I'm more > in favor of what I call "self-registry", like how /etc/rc.d works (if I > recall right). You drop files/scripts into a "known location" and their > mere presence (when found) acts as the registry, plus you can easily > update every file separately from others. Indeed. Which is why build systems have to be driven bottom up. So you can drop new sources into the right place and just expect them to be built. You can't do that with targets, because they're generated and you don't get to "drop" them anywhere :) > I dispute your > assertion that "some systems require recursion." You can't dispute it, LaTeX requires it, and there's no way around it. > I would assert that you have a design flaw in your package. Correct > building demands "full disclosure" to the build control system, in > whatever language. You misunderstand: interscript IS the build system. It doesn't need any disclosure. That's the point. It can generate code or documentation or indeed do ANY process at all without disclosure. It uses "discovery" instead. > All files must be listed; hidden temporary or > intermediate files not explicitly stated are accidents waiting to > happen. It isn't possible. See doxygen example. > Your example of (presumably) unpredictable deliverable targets > is even worse. It might be expedient for the programmer to just "write > the list as a smart rule," but I think it's bad design. It makes it > impossible to "manifest" the customer deliverable package in a > predictable and auditable way. (I have a lot of experience dealing with > CPE = current product engineering...) Yes, it does. You have re-think your quality control systems to handle this. > I understand WHY programmers like to operate this way. It's clunky to > have to "redundantly" state information to various parts of the > engineering system. Yes, it is clunky .. and only practical for simple systems. With more complex systems, it's a liability. That's the point here: if your build system *depends* on the replicated dependency information, then it can fail silently. If it doesn't it can't. So it is better if it doesn't :) > So being a clever programmer, hell I'll just write a script/program that > embodies some arcane app-specific knowledge about how to create targets > from sources, based on "discovery"... > > Believe me I've see all kinds of half-assed (well-intended but still > hackish) packages put together around these kinds of issues, with no > overall understanding of what it means to deliver maintainable, > updateable, removable packages to customers. I share your concerns. But don't knock discovery as such: like everything, not all systems are reliable! Almost everyone writing Ocaml programs uses Ocamldep to generate dependencies (Ocaml requires files be compiled in dependency order). Even in a 20-30 file program it is almost impossible to maintain the order by hand. If you have to do that, it becomes an obstacle to refactoring. > > I don't think the answer is to punt and say, "my targets are > auto-generated." A better answer is, "I have an easy way to specify > exactly what I'm expecting within and as output from the build system, > and to check that I got what I expected." I do that: what I expect is that the regression tests all pass :) > >> Yes. And the way to do that sophisticated stuff requires a REAL >> programming language like Python. Trying to do this with micky mouse >> crap like Make cannot possibly work. > > Uh, you dismiss it too quickly. No, I discarded it after 20 years struggling to understand how it works and failing all the time to see any connection between what it does and what general tools do: it works marginally well for C and that's about all. >> Fbuild is a caching build system. It caches the results of various >> operations (compiling stuff, etc) and knows when the caches are not up >> to date. So rebuilding is the same as building, except the caching >> allow skipping some parts of the build because the dependencies tell >> fbuild the results will be the same. > > Cool, that's the right concept. Yes, but it's nothing like make. It has ONLY build rules, there are no dependencies. (there is dependency generation in some of the subrules, for example to build an ocaml program). Rather you just give functions like: link(cc("x.c", cc("y.c"), result="aa.out") which just like the "make" rules.. no dependencies specified. But it doesn't do the compiles and links every time, it caches the results of each function call. Yes there are dependencies, but they're "discovered" because we know when you run cc('x.c') that that function call depends on file "x.c". The cc function puts a digest of the file into a database. next time it is called, if the digest is the same, it does nothing. (Actually, it returns the digest of "x.o" from the data base). >> Fbuild captures dependencies automatically, you not only don't have to >> specify them .. you CANNOT specify them. > > Caution, you appear to be headed down the same path as (now what was the > name again of Rational Software's kernel-incestuous over-the-top version > control and build package?) You couldn't swat a fly in that system > without first getting a doctoral thesis! I can't swat a fly with "make" so I'm no worse off :) > How do you let people specify unusual dependencies that aren't as simple > as compile this-to-that? In felix there is a directory called "buildsystem" which contains all the Felix specific rules: ~/felix>ls buildsystem/*.py buildsystem/__init__.py buildsystem/flx_stdlib.py buildsystem/bindings.py buildsystem/iscr.py buildsystem/demux.py buildsystem/judy.py <<<------------------------- buildsystem/dist.py buildsystem/mk_daemon.py buildsystem/dypgen.py buildsystem/ocs.py buildsystem/faio.py buildsystem/post_config.py buildsystem/flx.py buildsystem/re2.py buildsystem/flx_async.py buildsystem/sex.py buildsystem/flx_compiler.py buildsystem/show_build_config.py buildsystem/flx_drivers.py buildsystem/speed.py buildsystem/flx_exceptions.py buildsystem/sqlite3.py buildsystem/flx_gc.py buildsystem/timeout.py buildsystem/flx_glob.py buildsystem/tools.py buildsystem/flx_pthread.py buildsystem/tre.py buildsystem/flx_rtl.py buildsystem/version.py Each of of these files contains some special rules for building something, part of Felix, or a third party library. I marked one of some interest .. :) Here it is: ######################## import fbuild from fbuild.functools import call from fbuild.path import Path from fbuild.record import Record import buildsystem # ------------------------------------------------------------------------------ def build_runtime(phase): path = Path('src/judy') buildsystem.copy_hpps_to_rtl(phase.ctx, path / 'Judy.h') dst = 'lib/rtl/flx_judy' srcs = [ path / 'JudyCommon/JudyMalloc.c', path / 'Judy1/JUDY1_Judy1ByCount.c', path / 'Judy1/JUDY1_Judy1Cascade.c', path / 'Judy1/JUDY1_Judy1Count.c', path / 'Judy1/JUDY1_Judy1CreateBranch.c', path / 'Judy1/JUDY1_Judy1Decascade.c', path / 'Judy1/JUDY1_Judy1First.c', path / 'Judy1/JUDY1_Judy1FreeArray.c', path / 'Judy1/JUDY1_Judy1InsertBranch.c', path / 'Judy1/JUDY1_Judy1MallocIF.c', path / 'Judy1/JUDY1_Judy1MemActive.c', path / 'Judy1/JUDY1_Judy1MemUsed.c', path / 'Judy1/JUDY1_Judy1SetArray.c', path / 'Judy1/JUDY1_Judy1Set.c', path / 'Judy1/JUDY1_Judy1Tables.c', path / 'Judy1/JUDY1_Judy1Unset.c', path / 'Judy1/JUDY1_Judy1Next.c', path / 'Judy1/JUDY1_Judy1NextEmpty.c', path / 'Judy1/JUDY1_Judy1Prev.c', path / 'Judy1/JUDY1_Judy1PrevEmpty.c', path / 'Judy1/JUDY1_Judy1Test.c', path / 'Judy1/JUDY1_j__udy1Test.c', path / 'JudyL/JUDYL_JudyLByCount.c', path / 'JudyL/JUDYL_JudyLCascade.c', path / 'JudyL/JUDYL_JudyLCount.c', path / 'JudyL/JUDYL_JudyLCreateBranch.c', path / 'JudyL/JUDYL_JudyLDecascade.c', path / 'JudyL/JUDYL_JudyLDel.c', path / 'JudyL/JUDYL_JudyLFirst.c', path / 'JudyL/JUDYL_JudyLFreeArray.c', path / 'JudyL/JUDYL_JudyLInsArray.c', path / 'JudyL/JUDYL_JudyLIns.c', path / 'JudyL/JUDYL_JudyLInsertBranch.c', path / 'JudyL/JUDYL_JudyLMemActive.c', path / 'JudyL/JUDYL_JudyLMemUsed.c', path / 'JudyL/JUDYL_JudyLMallocIF.c', path / 'JudyL/JUDYL_JudyLTables.c', path / 'JudyL/JUDYL_JudyLNext.c', path / 'JudyL/JUDYL_JudyLNextEmpty.c', path / 'JudyL/JUDYL_JudyLPrev.c', path / 'JudyL/JUDYL_JudyLPrevEmpty.c', path / 'JudyL/JUDYL_JudyLGet.c', path / 'JudyL/JUDYL_j__udyLGet.c', path / 'JudySL/JudySL.c', path / 'JudyHS/JudyHS.c', ] includes = [ path, path / 'JudyCommon', path / 'Judy1', path / 'JudyL', path / 'JudySL', path / 'JudyHS', ] types = call('fbuild.builders.c.std.config_types', phase.ctx, phase.c.shared) macros = ['BUILD_JUDY'] if types['void*']['size'] == 8: macros.append('JU_64BIT') else: macros.append('JU_32BIT') return Record( static=buildsystem.build_c_static_lib(phase, dst, srcs, includes=includes, macros=macros), shared=buildsystem.build_c_shared_lib(phase, dst, srcs, includes=includes, macros=macros)) def build_flx(phase): return buildsystem.copy_flxs_to_lib(phase.ctx, Path('src/judy/*.flx').glob()) ################### this is actually quite "make like" in that the source files are all specified. notice though, there's no mention of *.o files or *.a or *.so or whatever. It's hard to see, but the build parts and the whole thing return "cache" of the process such that the system know when to rebuild it. I have to specify the inputs, and parameters to the build process, but usually not any outputs. The top level of the build system is not fbuild. it is a file: fbuildroot.py in Felix. fbuild is just a LIBRARY of tools which help specifying a build system. Erick is constantly adding new functionality to that to support more compilers etc. the most important part of that library is the bit which registers and caches functions, using Python marshalling features, Sqlite3 as the database, and uses RPC (remote procedure calls) in there somewhere as well. It's pretty complex. It works pretty well though! -- john skaller sk...@us... |
From: john s. <sk...@us...> - 2011-02-08 04:56:57
|
On 08/02/2011, at 3:26 PM, john skaller wrote: [] BTW: look at how "dependencies" are specified here: > types = call('fbuild.builders.c.std.config_types', > phase.ctx, phase.c.shared) > > macros = ['BUILD_JUDY'] > if types['void*']['size'] == 8: > macros.append('JU_64BIT') > else: > macros.append('JU_32BIT') > This is functional, with a variable for the temporary "types". To choose JU_64BIT or JU32_BIT we need to know the size of "void*". To find that size, we call the function "config_types". We can't get this wrong: you need the "types" variable to lookup the size, the way to get it is to call "config_types". The dependency is expressed functionally by execution, not by a declaration that you have to make target "config_type" before you can make target "Judy". fbuild will only actually call "config_types" if it is out of date or not yet calculated, otherwise it uses the cached result of a prior computation. In other words the dependency is "discovered" **dynamically** by executing the Python script. Yes, there's a problem: Python is dynamic so there's no static checking. But the point is again that the rules contain the dependencies: you don't have to state them explicitly separately from the build rule. Basically we just say "make with the targets removed and the build rules written as function calls", then ordinary dependencies of functions: calculate the arguments before calling the function -- expresses the dependencies. The reason why this is "better" than "make" should be obvious: you just write code to build your system, using ordinary function calls. Python is NOT the best language for doing this, but it has all the stuff needed for database storage, serialisation, remote procedure calls, networking, and a lot of other libraries built in. It's not just Turing complete, its a HEAVILY tried and tested general purpose programming language with good documentation so it has to be better than a micky mouse programming language like Make. [Importantly any language based on substitution is suspect: macros don't work well. Programmatic operation is the only way, you want to say "replace this with that" explicitly, not write a rule with holes in it that are automatically filled in.] Make can't handle configuration: people use autoconf for that. And other tools for all sorts of other things. Python can do all of this, either natively, or by callouts. So the whole build process can be expressed in ONE language. -- john skaller sk...@us... |