Thread: [q-lang-users] Two proposed improvements to Q
Brought to you by:
agraef
From: John C. <co...@cc...> - 2007-06-27 17:04:00
|
These are some changes I thought up while reviewing Q in a Nutshell. 1) Q currently can't import names selectively from a module. I propose the Python syntax for this: from <module> import <name>,<name>,...; and a variant that exports as well as importing: from <module> import <name>,<name>,...; and a variant that exports as well as importing: from <module> include <name>,<name>,...; This makes "from" a reserved word, but I think that is a small price to pay for the added convenience of being able to bring in specified names only without the risk of unintended namespace pollution. 2) Now that Q has ML-style reference cells (which are not documented anywhere in the Q manual that I can see), adding general mutable (but fixed-length) vectors is an easy extension: one simply adds a few equations to the standard library for treating tuples of cells properly. Following that, byte vectors (mutable byte strings) would also be a useful addition for dealing with large quantities of homogeneous data; these have to be done in C, but would be far more efficient than any alternative representation. -- Is not a patron, my Lord [Chesterfield], John Cowan one who looks with unconcern on a man http://www.ccil.org/~cowan struggling for life in the water, and when co...@cc... he has reached ground encumbers him with help? --Samuel Johnson |
From: Albert G. <Dr....@t-...> - 2007-06-27 23:29:08
|
John Cowan wrote: > These are some changes I thought up while reviewing Q in a Nutshell. > > 1) Q currently can't import names selectively from a module. I propose > the Python syntax for this: > > from <module> import <name>,<name>,...; > > and a variant that exports as well as importing: > > from <module> include <name>,<name>,...; Yes, the syntax seems reasonable, I already have this on my TODO list. One related question is whether prelude.q should really import all the standard library modules. This is convenient, but also a source of "namespace pollution". E.g., one might discuss whether clib.q and stdtypes.q should be imported explicitly in user programs. (Of course this raises backward compatibility issues.) > 2) Now that Q has ML-style reference cells (which are not documented > anywhere in the Q manual that I can see) References have actually been around since Q 3.x, IIRC, but they are implemented in clib and thus documented in Section 12 of the manual: http://q-lang.sourceforge.net/qdoc/qdoc_12.html#SEC137 > adding general mutable > (but fixed-length) vectors is an easy extension: one simply adds a > few equations to the standard library for treating tuples of cells > properly. Which operations do you have in mind? I mean, you can already create a tuple of references, e.g., like this: 'tuple [ref ():I in [1..N]]', and then assign with 'put (V!I) X' and retrieve a value with 'get (V!I)'. > Following that, byte vectors (mutable byte strings) would also be > a useful addition for dealing with large quantities of homogeneous > data; these have to be done in C, but would be far more efficient > than any alternative representation. Yes, actually I've had something like this on my TODO list for a while, in order to provide better support for numeric and signal processing applications. But this also means that there should be appropriate functions for dealing with C vectors and matrices of integer and float values. Maybe a SWIG interface to the GNU Scientific Library or the Octave library would be in order? Any volunteers for such a project? Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: John C. <co...@cc...> - 2007-06-29 05:07:23
|
Albert Graef scripsit: > Yes, the syntax seems reasonable, I already have this on my TODO list. Maybe you could export your TODO list, either to the website or by keeping it in the CVS project directory (which is what I do for my projects). It might lead to some interesting discussions. > One related question is whether prelude.q should really import all the > standard library modules. This is convenient, but also a source of > "namespace pollution". It's kind of slow, too. If I were going to cut anything, I'd cut clib. > Which operations do you have in mind? I mean, you can already create a > tuple of references, e.g., like this: 'tuple [ref ():I in [1..N]]', and > then assign with 'put (V!I) X' and retrieve a value with 'get (V!I)'. Sure. Just sugar them with "vget V I" and "vput V I X", I guess. Then you can add vmap, vfold, vunfold, .... > > Following that, byte vectors (mutable byte strings) would also be > > a useful addition for dealing with large quantities of homogeneous > > data; these have to be done in C, but would be far more efficient > > than any alternative representation. > > Yes, actually I've had something like this on my TODO list for a while, > in order to provide better support for numeric and signal processing > applications. The proposed R6RS Scheme bytevectors provide read-write access to any point in a bytevector as signed and unsigned {8,16,32}-bit values, single and double floats, and strings using a specified character code. -- John Cowan co...@cc... http://ccil.org/~cowan No man is an island, entire of itself; every man is a piece of the continent, a part of the main. If a clod be washed away by the sea, Europe is the less, as well as if a promontory were, as well as if a manor of thy friends or of thine own were: any man's death diminishes me, because I am involved in mankind, and therefore never send to know for whom the bell tolls; it tolls for thee. --John Donne |
From: Albert G. <Dr....@t-...> - 2007-06-30 21:45:43
|
John Cowan wrote: > Maybe you could export your TODO list, either to the website or by > keeping it in the CVS project directory (which is what I do for > my projects). My internal TODO list isn't really that suitable for publication ;-), and I just found that KOrganizer messes up the format in html export anyway. But when I have the time, I'll manually add at least the main TODO items to the wiki. > It's kind of slow, too. If I were going to cut anything, I'd cut clib. Well, considering namespace pollution, clib is probably the worst single offender. But starting the interpreter with the full library takes some 0.35 secs on my devel box (Athlon 2500+ running Linux, so not really a fast computer by today's standards), and clib takes maybe 0.12 secs from that. So I don't know whether it's really worth the hassle. Also, on second thought I realized that clib also provides some fairly essential stuff besides the system interface, like the C versions of important list and string routines, and the extra integer functions which are used and extended in rational.q. I'm afraid that it will be quite a task to untangle this, and I'd have to chop clib into smaller pieces for that. I don't like that idea very much, since right now clib is a nice central repository for all "C things" in the standard library. The only other obvious candidate I see would be stdtypes.q and the stuff that it includes, which shaves away a whopping 0.16 secs from the startup time, but namespace-wise it's probably not much more than a few dozen public operations. Still, by unbundling *both* the POSIX interface part of clib (maybe moving that to a system.q module, along with the stuff in getopt.q) and stdtypes.q from the prelude, we might be able to shave off almost 80% of the interpreter's default startup time and clean up the namespace considerably. Could be worth the hassle after all. But of course it also breaks backward compatibility big time. :( What does everybody else think about this? Are you all running little Q scripts in tight loops from the shell, or do you rather use them interactively? I.e., how do you want your prelude: "slim" or "fat"? :) > The proposed R6RS Scheme bytevectors provide read-write access to > any point in a bytevector as signed and unsigned {8,16,32}-bit values, > single and double floats, and strings using a specified character code. I was thinking more about a data structure which also carries type tags and dimensions, so that you don't accidentally use wrong indices or access the wrong type of information, and conversion between different number types could happen on the fly (e.g., if you read a bytevector with floating point samples from an audio file and then play it back on an audio device which only understands 16 bit integer samples). Of course, these objects could still provide the "raw" bytevector interface you sketched out at the same time. In fact we could probably make all of these new types subtypes of the existing ByteString type and extend ByteString's interface so that it provides all the needed raw operations and the appropriate conversions between different subtypes. Concerning matrices, there's of course the row- or column-major issue. Most C codes probably use row-major matrices nowadays, but column-major would be good for interfacing to legacy Fortran code. ;-) Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: John C. <co...@cc...> - 2007-07-01 07:39:10
|
Albert Graef scripsit: > Well, considering namespace pollution, clib is probably the worst single > offender. But starting the interpreter with the full library takes some > 0.35 secs on my devel box (Athlon 2500+ running Linux, so not really a > fast computer by today's standards), and clib takes maybe 0.12 secs from > that. So I don't know whether it's really worth the hassle. Startup is rather slower on my Cygwin-based laptop: comparing average wall-clock times for "q </dev/null" and "q --no-prelude </dev/null" gives me a library load time of 1.8 seconds, long enough to be quite annoying (the interpreter itself starts up in just 0.12 seconds). > Still, by unbundling *both* the POSIX interface part of clib (maybe > moving that to a system.q module, along with the stuff in getopt.q) and > stdtypes.q from the prelude, we might be able to shave off almost 80% of > the interpreter's default startup time and clean up the namespace > considerably. Could be worth the hassle after all. I'd agree with factoring Clib into Posix and non-Posix parts. For the rest, I'd remove everythin in prelude.q except the library includes to another file, and then people can include or exclude whatever they like just by changing prelude.q. That would preserve backward compatibility but make it easy to change a local installation. > What does everybody else think about this? Are you all running little Q > scripts in tight loops from the shell, or do you rather use them > interactively? I.e., how do you want your prelude: "slim" or "fat"? :) I tend to start Q with a small script and then evaluate a few expressions, then shut down, using Q as a sort of more-capable calculator. -- The experiences of the past show John Cowan that there has always been a discrepancy co...@cc... between plans and performance. http://www.ccil.org/~cowan --Emperor Hirohito, August 1945 |
From: Albert G. <Dr....@t-...> - 2007-07-02 04:36:00
|
John Cowan wrote: > Startup is rather slower on my Cygwin-based laptop: comparing average > wall-clock times for "q </dev/null" and "q --no-prelude </dev/null" > gives me a library load time of 1.8 seconds, long enough to be quite > annoying (the interpreter itself starts up in just 0.12 seconds). That's really slow in either case. Startup with --no-prelude just takes 0.008s on my Linux box, so that's more or less consistent with your figures (factor of about 10). What kind of CPU and amount of RAM does your laptop have? Also, did you compile Q with -O3 (CFLAGS=-O3 ./configure && make)? That really seems to make a difference with gcc. Stripping the resulting q and qc executables might help as well, I noticed that the unstripped executables are rather big, at least with mingw. In any case, I guess that unbundling the system interface from clib won't really make a difference in practice, since many average scripts will need it anyway. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2007-07-02 19:39:23
|
John Cowan wrote: > Maybe you could export your TODO list, either to the website or by > keeping it in the CVS project directory (which is what I do for > my projects). Ok, the "Developer's Corner" is now open at: http://q-lang.wiki.sourceforge.net/Developers I started a TODO list there (not much there yet, but I'll fill in items and details as I find the time). Feel free to add and comment on the items. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-01-18 12:11:27
|
John Cowan wrote: >>> Following that, byte vectors (mutable byte strings) would also be >>> a useful addition for dealing with large quantities of homogeneous >>> data; these have to be done in C, but would be far more efficient >>> than any alternative representation. >> Yes, actually I've had something like this on my TODO list for a while, >> in order to provide better support for numeric and signal processing >> applications. > > The proposed R6RS Scheme bytevectors provide read-write access to > any point in a bytevector as signed and unsigned {8,16,32}-bit values, > single and double floats, and strings using a specified character code. Ok, at long last I decided to give this a go. It's is in cvs now. Here's the relevant blurb from qdoc.info: Byte Strings as Mutable C Vectors --------------------------------- As of Q 7.11, `clib' supports a number of additional operations which allow you to treat byte strings as mutable C vectors of signed/unsigned 8/16/32 bit integers or single/double precision floating point numbers. The following functions provide read/write access to the elements of such C vectors. Note that the given index argument `I' is interpreted relative to the corresponding element type. Thus, e.g., `get_int32 B I' returns the `I'th 32 bit integer rather than the integer at byte offset `I'. NOTE: Integer arguments must fit into machine integers, otherwise these operations will fail. Integers passed for floating point arguments will be coerced to floating point values automatically. public extern get_int8 B I, get_int16 B I, get_int32 B I; public extern get_uint8 B I, get_uint16 B I, get_uint32 B I; public extern get_float B I, get_double B I; public extern put_int8 B I X, put_int16 B I X, put_int32 B I X; public extern put_uint8 B I X, put_uint16 B I X, put_uint32 B I X; public extern put_float B I X, put_double B I X; Moreover, the following convenience functions are provided to convert between byte strings and lists of integer/floating point elements. public extern int8_list B, int16_list B, int32_list B; public extern uint8_list B, uint16_list B, uint32_list B; public extern float_list B, double_list B; public extern int8_vect Xs, int16_vect Xs, int32_vect Xs; public extern uint8_vect Xs, uint16_vect Xs, uint32_vect Xs; public extern float_vect Xs, double_vect Xs; --- And a few examples: ==> def B = uint32_vect [100..110] ==> B <<ByteStr>> ==> uint32_list B [100,101,102,103,104,105,106,107,108,109,110] ==> get_uint32 B 1 101 ==> put_uint32 B 1 0xffffffff () ==> uint32_list B [100,4294967295,102,103,104,105,106,107,108,109,110] ==> take 12 $ int8_list B [100,0,0,0,-1,-1,-1,-1,102,0,0,0] ==> float_vect [1..10] <<ByteStr>> ==> float_list _ [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0] Please note that this is just the "bare bones" C interface, but building higher-level Q APIs on top of that should be a piece of cake now. John, I hope that this is what you had in mind. It still lacks the "... and strings using a specified character code" part, though. Considering that a byte string might actually contain character data in an arbitrary encoding, it's not clear to me how that should be done, could you elaborate please? Eddie, do you think that this will be good enough for the qcalc stats stuff and GSL interface we talked about a while ago? Maybe for efficiency there should be an additional function in your forthcoming CSV module to directly convert between numeric data in CSV format and C vectors as byte strings? Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: John C. <co...@cc...> - 2008-01-21 04:24:38
|
Albert Graef scripsit: > John, I hope that this is what you had in mind. It still lacks the "... > and strings using a specified character code" part, though. Considering > that a byte string might actually contain character data in an arbitrary > encoding, it's not clear to me how that should be done, could you > elaborate please? What you've got looks good. For strings, I had something in mind like get_string Bytes EncodingName StartIndex EndIndex to decode a portion of a byte string into a string, and put_string Bytes EncodingName Index String to encode a string into a portion of a byte string. These allow you to process character data wherever it might exist in a binary sequence. However, the latter produces an unpredictable number of bytes, and might need to be supplemented with a factory byte_string EncodingName String -- You escaped them by the will-death John Cowan and the Way of the Black Wheel. co...@cc... I could not. --Great-Souled Sam http://www.ccil.org/~cowan |
From: Albert G. <Dr....@t-...> - 2008-01-21 07:12:56
|
John Cowan wrote: > What you've got looks good. For strings, I had something in mind like > > get_string Bytes EncodingName StartIndex EndIndex > > to decode a portion of a byte string into a string, and > > put_string Bytes EncodingName Index String Should the indices be byte offsets? Also, put_string would just overwrite the part of the string at the given offset, right? In that case it should be easy to get that kind of functionality with existing routines, I just need to add a function to replace a slice of a byte string in-place. > to encode a string into a portion of a byte string. These allow you to > process character data wherever it might exist in a binary sequence. > However, the latter produces an unpredictable number of bytes, and might > need to be supplemented with a factory > > byte_string EncodingName String If I understand this correctly, bytestr already provides that functionality. That is, you can use 'bytestr (S,CODESET)' to create a byte string in a given encoding from a string, and you can even do 'bytestr (S,CODESET,SIZE)' if you want the byte string to be truncated or zero-padded to fit a given size. Is that what you meant? Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: John C. <co...@cc...> - 2008-01-21 17:20:50
|
Albert Graef scripsit: > John Cowan wrote: > > What you've got looks good. For strings, I had something in mind like > > > > get_string Bytes EncodingName StartIndex EndIndex > > > > to decode a portion of a byte string into a string, and > > > > put_string Bytes EncodingName Index String > > Should the indices be byte offsets? Also, put_string would just > overwrite the part of the string at the given offset, right? Yes to both questions. > In that > case it should be easy to get that kind of functionality with existing > routines, I just need to add a function to replace a slice of a byte > string in-place. Hmm, right. (I should have re-read the existing docs.) How about this? bytecopy FromByteString FromIndex ToByteString ToIndex Length > > byte_string EncodingName String > > If I understand this correctly, bytestr already provides that > functionality. Yes, you're right. As I said, I should have re-read the existing docs. -- John Cowan co...@cc... http://ccil.org/~cowan If I have not seen as far as others, it is because giants were standing on my shoulders. --Hal Abelson |
From: Albert G. <Dr....@t-...> - 2008-01-21 23:37:44
|
John Cowan wrote: > Hmm, right. (I should have re-read the existing docs.) How about this? > > bytecopy FromByteString FromIndex ToByteString ToIndex Length I already started rewriting the new get_xxx/put_xxx functions so that you can also read/write slices instead of just single elements. That has the advantage that you can use indices relative to the different int/float types instead of just bytes. -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |
From: Albert G. <Dr....@t-...> - 2008-01-22 12:53:32
|
Albert Graef wrote: > I just need to add a function to replace a slice of a byte > string in-place. Ok, this is in cvs now. Examples: ==> put_uint32 B (-2) (uint32_vect [90..94]) () ==> uint32_list B [92,93,94,103,104,105,106,107,108,109,110] ==> uint32_list $ get_uint32 B (-2,3) [92,93,94,103] ==> uint32_list $ get_uint32 B (8,100) [108,109,110] Note that these operations are all safe in that indices are automatically confined to stay within the bounds of the target vector. Here's how you can manipulate a string represented as a byte string in a given character encoding: ==> def S = bytestr ("Hello world!","latin1") ==> put_uint8 S 3 $ bytestr ("äöü","latin1"); bstr S () "Heläöüworld!" Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikinformatik.uni-mainz.de/ag |