From: Bruno H. <br...@cl...> - 2005-01-25 10:59:49
|
Hi Sam, This function is unusable because underspecified: /* bit sequence --> integer */ global maygc object udigits_to_I (void* digits, uintC len) { 1) The documentation should specify details about the input (is 'digits' pointing to a single object of type void or to multiple ones? is len > 0 or can it be 0 ? Is the input in big-endian or little-endian order?) 2) The doc should mention "can trigger GC"! 3) In C, a 'void *' designates a pointer to memory of undefined contents. It's not useful to access such memory. If there is data at *digits, you need to specify the element size. 4) Does the function write to *digits? If not, then why is the 'digits' pointer not 'const'? 5) The function is lacking a comment of its own in lispbibl.d. Imagine how unusable lispbibl.d would be if it contained only function declarations without comments! Bruno |
From: Sam S. <sd...@gn...> - 2005-01-25 14:47:53
|
Hi Bruno, > * Bruno Haible <oe...@py...t> [2005-01-25 11:53:38 +0100]: > > This function is unusable because underspecified: > > /* bit sequence --> integer */ > global maygc object udigits_to_I (void* digits, uintC len) { > > 1) The documentation should specify details about the input (is 'digits' > pointing to a single object of type void or to multiple ones? digits points to a sequence of bytes, so it should probably be uint8*. > is len > 0 or can it be 0 ? dunno. it depends on the behavior of UDS_to_I() - which is not specified in the comments either. BTW, what does "there must be room for 1 digit below of MSDptr" mean? > Is the input in big-endian or little-endian order?) I don't know what "big-endian" and "little-endian" are. In my kindergarten they taught only one way to record numbers with digits: 123 = 1*100 + 2*10 + 3*1 So, byte sequence "123" represents the integer 1*256^2 + 2*256 + 3*1. > 2) The doc should mention "can trigger GC"! OK (although the "maygc" declaration should be enough, IMO) > 3) In C, a 'void *' designates a pointer to memory of undefined contents. > It's not useful to access such memory. If there is data at *digits, > you need to specify the element size. I guess, "digits" should be "uint8*". > 4) Does the function write to *digits? I don't think so - but UDS_to_I() does not declare its argument const... > If not, then why is the 'digits' pointer not 'const'? because I don't know what "const" means and I am sick of compiler warnings and error triggered by such declarations. > 5) The function is lacking a comment of its own in lispbibl.d. Imagine > how unusable lispbibl.d would be if it contained only function > declarations without comments! right... Maybe you could fix the comments of UDS_to_I() first? -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> There are 10 kinds of people: those who count in binary and those who do not. |
From: Bruno H. <br...@cl...> - 2005-01-25 16:48:05
|
Sam wrote: > digits points to a sequence of bytes, so it should probably be uint8*. Yes. uint8* or uintB*, it's the same. (I use uint8 only when the number of bits matters, whereas uintB is more the elementary unit of addressable memory.) > > is len > 0 or can it be 0 ? > > dunno. it depends on the behavior of UDS_to_I() UDS_to_I() is specified to take an UDS as input; this term is defined in arilev1.d. > BTW, what does "there must be room for 1 digit below of MSDptr" mean? It means that MSDptr[-1] may be written to. That's also the reason why the argument is not a 'const' pointer. > > Is the input in big-endian or little-endian order?) > > I don't know what "big-endian" and "little-endian" are. > In my kindergarten they taught only one way to record numbers with > digits: 123 = 1*100 + 2*10 + 3*1 > So, byte sequence "123" represents the integer 1*256^2 + 2*256 + 3*1. This is big-endian. Little-endian would be the opposite: when the byte sequence 1 2 3, ordered from low to high addresses, represents the integer 1*1 + 2*256 + 3*256^2. > > If not, then why is the 'digits' pointer not 'const'? > > because I don't know what "const" means If a function takes a 'const uintB *' argument, it promises to not write to memory through that pointer. So that means a :IN parameter, whereas 'uintB *' means a :IN-OUT or :OUT parameter (in our FFI's notation). > Maybe you could fix the comments of UDS_to_I() first? I'm updating the definitions in arilev1.d. For the implementation of udigits_to_I: Note that you cannot always interpret an array of type uint8[] as an array of type uintD[]: The number of elements may not be a multiple of sizeof(uintD)/sizeof(uint8) (what do you want to do with the last few bytes in that case?), and the alignment of the memory may not be a multiple of alignof(uintD). Bruno |
From: Sam S. <sd...@gn...> - 2005-01-25 18:01:30
|
> * Bruno Haible <oe...@py...t> [2005-01-25 17:41:53 +0100]: > > UDS_to_I() is specified to take an UDS as input; this term is defined > in arilev1.d. in German. > I'm updating the definitions in arilev1.d. thanks. > For the implementation of udigits_to_I: > Note that you cannot always interpret an array of type uint8[] as an > array of type uintD[]: The number of elements may not be a multiple > of sizeof(uintD)/sizeof(uint8) (what do you want to do with the last few > bytes in that case?), zero them out? (hmm, this implies little-endian). > and the alignment of the memory may not be a multiple of > alignof(uintD). crash? :-( -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Those who can laugh at themselves will never cease to be amused. |
From: Bruno H. <br...@cl...> - 2005-01-25 19:34:12
|
Sam wrote: > > For the implementation of udigits_to_I: > > Note that you cannot always interpret an array of type uint8[] as an > > array of type uintD[]: The number of elements may not be a multiple > > of sizeof(uintD)/sizeof(uint8) (what do you want to do with the last few > > bytes in that case?), > > zero them out? (hmm, this implies little-endian). > > > and the alignment of the memory may not be a multiple of > > alignof(uintD). > > crash? :-( Yes, likely. - But before talking about the implementation of an unclear function, can you clarify what the function _should_ do? What does the berkeley-db module expect from it? Bruno |
From: Sam S. <sd...@gn...> - 2005-01-26 17:10:16
|
> * Bruno Haible <oe...@py...t> [2005-01-25 20:28:06 +0100]: > > Sam wrote: >> > For the implementation of udigits_to_I: >> > Note that you cannot always interpret an array of type uint8[] as an >> > array of type uintD[]: The number of elements may not be a multiple >> > of sizeof(uintD)/sizeof(uint8) (what do you want to do with the last few >> > bytes in that case?), >> >> zero them out? (hmm, this implies little-endian). >> >> > and the alignment of the memory may not be a multiple of >> > alignof(uintD). >> >> crash? :-( > > Yes, likely. - But before talking about the implementation of an > unclear function, can you clarify what the function _should_ do? What > does the berkeley-db module expect from it? this is a platform-independent integer <--> bit sequence conversion. this should be the _inverse_ of fill_dbt() in bdb.c: } else if (bignump(obj)) { int need = sizeof(uintD)*Bignum_length(obj); key->ulen = key->size = MAX(re_len,need); key->data = my_malloc(key->size); begin_system_call(); memset(key->data,0,key->size); memcpy((char*)key->data + key->size - need,TheBignum(obj)->data,need); end_system_call(); return DBT_INTEGER; } else ... -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> The world is coming to an end. Please log off. |
From: Bruno H. <br...@cl...> - 2005-01-26 20:01:56
|
Sam wrote: > > clarify what the function _should_ do? What > > does the berkeley-db module expect from it? > > this is a platform-independent integer <--> bit sequence conversion. > this should be the _inverse_ of fill_dbt() in bdb.c: > > } else if (bignump(obj)) { > int need = sizeof(uintD)*Bignum_length(obj); > key->ulen = key->size = MAX(re_len,need); > key->data = my_malloc(key->size); > begin_system_call(); > memset(key->data,0,key->size); > memcpy((char*)key->data + key->size - need,TheBignum(obj)->data,need); > end_system_call(); > return DBT_INTEGER; > } else ... I'm sorry, but this is not platform independent. 1) The size of uintD is platform dependent. It can currently be 32 or 16 (or even 8) bits, and we might a 64-bit type for some 64-bit machines in the future. 2) The result that you write is endianness dependent, therefore will be different on SPARC (big endian) than on x86 (little endian). For example, if obj = #x123456789ABCDEF0, TheBignum(obj)->data[0] = 0x12345678, TheBignum(obj)->data[1] = 0x9ABCDEF0, i.e. on a little-endian machine you will write out the bytes 0x78, 0x56, 0x34, 0x12, 0xF0, 0xDE, 0xBC, 0x9A, and on a big-endian machine you will write out the bytes 0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0. Look at the functions bitbuff_iu_I and bitbuff_ixu_sub in stream.d, or the functions READ-INTEGER and WRITE-INTEGER, to get an idea how to convert an integer from/to a byte sequence. Bruno |
From: Sam S. <sd...@gn...> - 2005-01-27 21:42:57
|
> * Bruno Haible <oe...@py...t> [2005-01-26 20:55:44 +0100]: > > Look at the functions bitbuff_iu_I and bitbuff_ixu_sub in stream.d, or > the functions READ-INTEGER and WRITE-INTEGER, to get an idea how to > convert an integer from/to a byte sequence. OK. why is strm_bitbuffer a LISP heap-allocated object and not a malloc'ed memory area? advantages: when stream is GCed, bitbuffer goes without special effort. disadvantages: lack of modularity (tight coupling with the stream world), have to save the buffer when allocating bignum in bitbuff_is_I(). Note that bitbuff_is_I() takes 2 length arguments: bitsize and bytesize; which is confusing (one should be enough). bitbuff_ixu_sub() is even worse: it also takes a stream (for error reporting). malloc: GC already closes streams, so close() should just free() the bitbuffer. no special finalization code. bitbuff_is_I()&bitbuff_ixu_sub can be rewritten as a general bits<->I API: /* convert the bit buffer of the given length to a (un)signed integer */ maygc object bitbuff_to_I (const uintB* buffer, uintL length, bool signed_p); /* return the minimum size, in bytes, of the buffer where integer will be written */ uintL I_bitsize (object integer, bool signed_p); /* write the (un)signed integer into buffer of the given length. assumes that length is at least as large as I_bitsize(). */ void I_to_bitbuff (object integer, uintB* buffer, uintL length, bool signed_p); -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> Ernqvat guvf ivbyngrf QZPN. |
From: Bruno H. <br...@cl...> - 2005-01-28 13:50:43
|
Sam wrote: > why is strm_bitbuffer a LISP heap-allocated object and not a malloc'ed > memory area? Because it is stored in the stream, as a cache (for speed), and a pointer to malloc'ed memory becomes invalid after savemem + loadmem. One could store the malloc'ed memory pointer inside a Fpointer object, but this would lead to a double indirection instead of a single indirection, plus the need to check for the foreign pointer's validity before use. > disadvantages: lack of modularity (tight coupling with the stream > world) Yes, this could be improved. > have to save the buffer when allocating bignum in bitbuff_is_I(). This is normal when we work with Lisp objects. > bitsize and bytesize; which is confusing (one should be enough) bytesize = ceiling(bitsize,8); Indeed just two cheap arithmetic instructions. > bitbuff_is_I()&bitbuff_ixu_sub can be rewritten as a general bits<->I API: Good idea. That could go into a new file, to be included from lisparit.d right after intread.d: intserial.d - serialization of integers as little-endian byte sequences > /* convert the bit buffer of the given length to a (un)signed integer */ > maygc object bitbuff_to_I (const uintB* buffer, uintL length, bool signed_p); There probably need to be two variants of this function, one taking an "const uintB* buffer", the other one taking an "object buffer" (a simple-array of rank 1 and element-type (unsigned-byte 8)). I would call these functions LEbytes_to_I and LEbytes_vector_to_I instead of bitbuff_to_I. > /* return the minimum size, in bytes, of the buffer where integer will > be written */ > uintL I_bitsize (object integer, bool signed_p); Let's call it I_to_LEbytes_size. It's just a thin wrapper around I_integer_length. > /* write the (un)signed integer into buffer of the given length. > assumes that length is at least as large as I_bitsize(). */ > void I_to_bitbuff (object integer, uintB* buffer, uintL length, bool signed_p); Yes. Let's call it I_to_LEbytes. (I don't like to call it "bitbuffer" because it's only stream.d which goes down to the bit level. intserial.d goes only down to the uintB level.) This intserial.d can be written indepedently of any streams. While then changing stream.d to use it, please be careful with the quality of the ASSERT_wr_int error message: a stream object with a filename in it is very useful for the user when an error occurs. Bruno |
From: Sam S. <sd...@gn...> - 2005-01-28 21:08:17
|
> * Bruno Haible <oe...@py...t> [2005-01-28 14:44:01 +0100]: > > Sam wrote: >> why is strm_bitbuffer a LISP heap-allocated object and not a malloc'ed >> memory area? > > Because it is stored in the stream, as a cache (for speed), and a pointer > to malloc'ed memory becomes invalid after savemem + loadmem. OOPS! flushing bitbuff on savemem?... converting bitbuff to a lisp vector on savemem and back on loadmem?... yuk... >> disadvantages: lack of modularity (tight coupling with the stream >> world) > Yes, this could be improved. please do. >> have to save the buffer when allocating bignum in bitbuff_is_I(). > This is normal when we work with Lisp objects. but in this specific case this is extremely inconvenient: it prevents bitbuff_to_I() from accepting a pointer to memory as the argument (see below ***). >> bitsize and bytesize; which is confusing (one should be enough) > > bytesize = ceiling(bitsize,8); > Indeed just two cheap arithmetic instructions. let's drop it, then. >> bitbuff_is_I()&bitbuff_ixu_sub can be rewritten as a general bits<->I API: > > Good idea. That could go into a new file, to be included from lisparit.d > right after intread.d: > intserial.d - serialization of integers as little-endian byte sequences ok. >> /* convert the bit buffer of the given length to a (un)signed integer */ >> maygc object bitbuff_to_I (const uintB* buffer, uintL length, bool signed_p); > > There probably need to be two variants of this function, one taking an > "const uintB* buffer", the other one taking an "object buffer" (a > simple-array of rank 1 and element-type (unsigned-byte 8)). > I would call these functions LEbytes_to_I and LEbytes_vector_to_I > instead of bitbuff_to_I. this need for two functions is the reason why the `maygc' property of bitbuff_to_I() is so bad (see above ***) > This intserial.d can be written indepedently of any streams. While > then changing stream.d to use it, please be careful with the quality > of the ASSERT_wr_int error message: a stream object with a filename in > it is very useful for the user when an error occurs. I hoped that you would take care of this. I'd rather not touch the bit-pushing code. -- Sam Steingold (http://www.podval.org/~sds) running w2k <http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/> <http://www.mideasttruth.com/> <http://www.honestreporting.com> I haven't lost my mind -- it's backed up on tape somewhere. |
From: Bruno H. <br...@cl...> - 2005-01-29 16:46:36
|
Sam wrote: > >> disadvantages: lack of modularity (tight coupling with the stream > >> world) > > > > Yes, this could be improved. > > please do. Done. The new functions are also exported through genclisph.d. You can use them in the modules. > > There probably need to be two variants of this function, one taking an > > "const uintB* buffer", the other one taking an "object buffer" (a > > simple-array of rank 1 and element-type (unsigned-byte 8)). > > I would call these functions LEbytes_to_I and LEbytes_vector_to_I > > instead of bitbuff_to_I. > > this need for two functions is the reason why the `maygc' property of > bitbuff_to_I() is so bad (see above ***) But the two functions actually share 99% of their code. Bruno |