Thread: [q-lang-users] Re: Newbie questions and comments
Brought to you by:
agraef
From: Greg B. <Gre...@sl...> - 2005-04-11 17:06:36
|
Albert Graef wrote: >Matt Gushee wrote: >> * So why isn"t Q better known? > > Good question. Well, maybe I don"t advertise it enough. This year will > be the first time that I actually present Q at an international > conference. You see, for a long time Q has just been one of my "hobby > projects", and it has only been "out there" on SourceForge for about > one year. But I guess that the most prominent factor is that > functional languages are still perceived as something fairly exotic, > and many programmers can"t easily wrap their head around this way of > programming. It just takes time I guess. For more exposure, you could always try to get Q included in "The Great Computer Language Shootout"... http://shootout.alioth.debian.org/ Greg Buchholz |
From: Tim H. <q...@st...> - 2005-04-11 19:24:50
|
Greg Buchholz <Gre...@sl...> writes: > For more exposure, you could always try to get Q included in "The > Great Computer Language Shootout"... > > http://shootout.alioth.debian.org/ Random datapoint: on my machine it took Q 28mins to do the takfp test listed there. (However, at least it did it in constant memory; my obvious implementation in Perl took 7mins but blew to 1.4Gb VSZ...) ~Tim -- <http://spodzone.org.uk/> |
From: Albert G. <Dr....@t-...> - 2005-04-15 09:29:38
|
Tim Haynes wrote: > Random datapoint: on my machine it took Q 28mins to do the takfp test > listed there. Did you use integer or floating point arithmetic? Here are two runs from my machine: ==> tak 30 20 10; stats 11 485 secs, 278308692 reductions, 133 cells ==> tak 30.0 20.0 10.0; stats 11.0 431 secs, 278308692 reductions, 133 cells I'd guess that with machine integers the performance would be closer to that of Python. Not that this is a real option, since Q's integers are always bigints. > (However, at least it did it in constant memory; my obvious implementation > in Perl took 7mins but blew to 1.4Gb VSZ...) Languages without tail call elimination suck. ;-) Q uses only 133 expression cells (~3KB), so the memory requirements would be just what the interpreter needs for global data anyway... Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: John C. <co...@cc...> - 2005-04-15 12:27:17
|
Albert Graef scripsit: > Did you use integer or floating point arithmetic? On modern hardware the performance is often just about the same. In certain cases, floating-point is actually faster. -- Mark Twain on Cecil Rhodes: John Cowan I admire him, I freely admit it, http://www.ccil.org/~cowan and when his time comes I shall http://www.reutershealth.com buy a piece of the rope for a keepsake. co...@cc... |
From: Tim H. <q...@st...> - 2005-04-15 12:56:49
|
John Cowan <co...@cc...> writes: > Albert Graef scripsit: > > > Did you use integer or floating point arithmetic? > > On modern hardware the performance is often just about the same. > In certain cases, floating-point is actually faster. On the same box as I did the hugs-v-q comparison: | ==> takfp::takstart 10 ; stats | 11.0 | 453 secs, 278308696 reductions, 78 cells | | ==> takfp::takstart 10.0 ; stats | 11.0 | 452 secs, 278308696 reductions, 78 cells <http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=delta&action=Search>, sense 3 ;) ~Tim -- <http://spodzone.org.uk/> |
From: Albert G. <Dr....@t-...> - 2005-04-15 13:51:37
|
Tim Haynes wrote: > On the same box as I did the hugs-v-q comparison: > > | ==> takfp::takstart 10 ; stats > | 11.0 > | 453 secs, 278308696 reductions, 78 cells > | > | ==> takfp::takstart 10.0 ; stats > | 11.0 > | 452 secs, 278308696 reductions, 78 cells Hmm, that looks like your tak program forces floating point arithmetic in any case; otherwise the first result should have been 10, not 10.0. Can you post your program here? -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: Tim H. <q...@st...> - 2005-04-15 13:54:32
|
Albert Graef <Dr....@t-...> writes: [snip] > Hmm, that looks like your tak program forces floating point arithmetic in > any case; otherwise the first result should have been 10, not 10.0. Can you > post your program here? Oh, ahh... D'oh! #!/usr/bin/env q #! -cmain ARGS || quit takfp (X, Y, Z) = Z if Y>=X; = takfp ( takfp (X-1.0, Y, Z), takfp (Y-1.0, Z, X), takfp (Z-1.0, X, Y)) otherwise; takstart N = takfp (N*3.0, N*2.0, N*1.0); main ARGS = writes (str (takstart (val (ARGS!1)))) || writes "\n" || quit; Yeah. I spot the `-1.0' everywhere. OK, will try again.... ~Tim -- <http://spodzone.org.uk/> |
From: Albert G. <Dr....@t-...> - 2005-04-15 14:16:46
|
Tim Haynes wrote: > Yeah. I spot the `-1.0' everywhere. OK, will try again.... And don't forget the *3.0 etc. ;-) -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: Tim H. <q...@st...> - 2005-04-15 14:25:05
|
Albert Graef <Dr....@t-...> writes: > Tim Haynes wrote: > > Yeah. I spot the `-1.0' everywhere. OK, will try again.... > > And don't forget the *3.0 etc. ;-) | ==> takfp::takintstart 10; stats | 11 | 563 secs, 278308696 reductions, 78 cells Int slow, float quite nice. HTH :) ~Tim -- <http://spodzone.org.uk/> |
From: Albert G. <Dr....@t-...> - 2005-04-15 19:49:28
|
Tim Haynes wrote: > | ==> takfp::takintstart 10; stats > | 11 > | 563 secs, 278308696 reductions, 78 cells > > Int slow, float quite nice. Yes, that's pretty much in line with my results, thanks. Well, your int/float performance ratio is 10% worse than mine, I wonder why's that, maybe an older gmp version? Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: Tim H. <q...@st...> - 2005-04-15 20:16:59
|
Albert Graef <Dr....@t-...> writes: > Tim Haynes wrote: > > | ==> takfp::takintstart 10; stats > > | 11 > > | 563 secs, 278308696 reductions, 78 cells > > Int slow, float quite nice. > > Yes, that's pretty much in line with my results, thanks. Well, your > int/float performance ratio is 10% worse than mine, I wonder why's that, > maybe an older gmp version? | zsh/scr, straw 9:16PM q-src/ % epm -qaG| grep gmp | dev-libs/gmp-4.1.4 And/or experimental error, maybe? ~Tim -- <http://spodzone.org.uk/> |
From: Tim H. <q...@st...> - 2005-04-15 20:18:54
|
Tim Haynes <q...@st...> writes: [snip] > > Yes, that's pretty much in line with my results, thanks. Well, your > > int/float performance ratio is 10% worse than mine, I wonder why's that, > > maybe an older gmp version? > > | zsh/scr, straw 9:16PM q-src/ % epm -qaG| grep gmp > | dev-libs/gmp-4.1.4 > > And/or experimental error, maybe? Oh, also I've being giving you results from a box: | vendor_id : AuthenticAMD | cpu family : 6 | model : 10 | model name : AMD Athlon(tm) XP 2500+ | stepping : 0 | cpu MHz : 1830.002 | cache size : 512 KB if that makes any difference. :) ~Tim -- <http://spodzone.org.uk/> |
From: Albert G. <Dr....@t-...> - 2005-04-15 14:03:06
|
Albert Graef wrote: > Hmm, that looks like your tak program forces floating point arithmetic > in any case; otherwise the first result should have been 10, not 10.0. > Can you post your program here? Of course I meant 11 vs. 11.0. ;-) -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: Albert G. <Dr....@t-...> - 2005-04-15 13:48:25
|
John Cowan wrote: >>Did you use integer or floating point arithmetic? > > On modern hardware the performance is often just about the same. > In certain cases, floating-point is actually faster. Also with double precision floats? Then GMP's bigint implementation must be quite good. (Q uses those for integer arithmetic. So the two test runs I gave were actually GMP vs. machine double precision floats.) Anyway, thanks for the hint. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: John C. <jc...@re...> - 2005-04-15 16:49:35
|
Albert Graef scripsit: > Also with double precision floats? Then GMP's bigint implementation must > be quite good. (Q uses those for integer arithmetic. So the two test > runs I gave were actually GMP vs. machine double precision floats.) Internally it's typical to do everything with extended-precision floats (on Intel machines these are 80-bit objects) and then trim to single or double precision only when storing. If I were implementing a language from scratch today for anything but bare-metal programming, I'd probably leave off machine-size integers altogether, as you have done, and just have doubles and bignums, or even just doubles (like Perl and Lua). -- John Cowan jc...@re... www.reutershealth.com www.ccil.org/~cowan Heckler: "Go on, Al, tell 'em all you know. It won't take long." Al Smith: "I'll tell 'em all we *both* know. It won't take any longer." |
From: Albert G. <Dr....@t-...> - 2005-04-15 20:01:42
|
John Cowan wrote: > If I were implementing a language from scratch today for anything but > bare-metal programming, I'd probably leave off machine-size integers > altogether, as you have done, and just have doubles and bignums, or even > just doubles (like Perl and Lua). Well, I really need the bignums for doing some number-theoretic stuff. Of course, you could reimplement them yourself using lists but this is inconvenient and slow. I also thought about adding arbitrary precision floats, but if I take a look at the bigint vs. float performance, that's probably not a good idea. ;-) Hey, I just noticed that you are the current maintainer of figlet! Good program. I've been playing around with it, to make a nicer sign-on message for the Q interpreter. Maybe in the next version. :) Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: John C. <jc...@re...> - 2005-04-15 21:16:04
|
Albert Graef scripsit: > Well, I really need the bignums for doing some number-theoretic stuff. > Of course, you could reimplement them yourself using lists but this is > inconvenient and slow. Yes, that would be a pain. > I also thought about adding arbitrary precision floats, but if I take a > look at the bigint vs. float performance, that's probably not a good > idea. ;-) Agreed. > Hey, I just noticed that you are the current maintainer of figlet! Good > program. I've been playing around with it, to make a nicer sign-on > message for the Q interpreter. Maybe in the next version. :) Thanks. I am only the maintainer faute de mieux: I haven't done a thing with it for years, and indeed I never did learn how the basic rendering engine works; I am only responsible for the Unicode wrapper. -- "Well, I'm back." --Sam John Cowan <jc...@re...> |
From: Albert G. <Dr....@t-...> - 2005-04-16 07:44:18
|
John Cowan wrote: > Thanks. I am only the maintainer faute de mieux: I haven't done a thing > with it for years, and indeed I never did learn how the basic rendering > engine works; I am only responsible for the Unicode wrapper. Interested in retrofitting unicode support to some strange obscure functional programming language? ;-) I have this unicode stuff on my TODO list for a _very_ long time, but somehow I can't wrap my head around it. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: John C. <co...@cc...> - 2005-04-16 16:33:24
|
Albert Graef scripsit: > Interested in retrofitting unicode support to some strange obscure > functional programming language? ;-) I'd be interested in helping you do it, and maybe peeking at some source code here and there. Overall, I'd say you're 90% of the way thanks to two decisions about typing: 1) You don't have a character type in Q; 2) You already distinguish firmly between strings and byte vectors. Not having a character type in Q means that you don't have to break any assumptions about how big a character can be: in Unicode there are 0x11000 different potential characters (most of them unassigned), not 128 or 256. I recommend that Q strings use the UTF-8 encoding internally. The UTF-8 encoding uses 1, 2, 3, or 4 bytes to encode each character depending on the numerical equivalent of the character. In particular, the ASCII subset uses a 1-byte representation, the same as ASCII itself, and the bytes 0x00 through 0x7F are never used for anything else. The Latin-1 subset, however, requires a 2-byte representation. There will be five places where Unicode has to be addressed: in pulling substrings out of strings, in reading, in writing, in converting from strings to byte strings, and in converting from byte strings to strings. In the last two cases, it is desirable (but not necessary) to provide a method of overriding the system standard external encoding such as Latin-1 which is generated or interpreted respectively. The iconv_open(), iconv(), and iconv_close() functions do the donkey work of conversion. If they are not available on a system, the GNU iconv library provides a good implementation. It's distributed under the Lesser GPL, so it will not affect the licensing of Q. > I have this unicode stuff on my TODO list for a _very_ long time, but > somehow I can't wrap my head around it. I understand. I hope the above is somewhat helpful; I'll be happy to answer questions either on this list or privately. -- There is / One art John Cowan <co...@cc...> No more / No less http://www.reutershealth.com To do / All things http://www.ccil.org/~cowan With art- / Lessness -- Piet Hein |
From: Albert G. <Dr....@t-...> - 2005-04-25 11:25:08
|
John Cowan wrote: > I'd be interested in helping you do it, and maybe peeking at some source > code here and there. Overall, I'd say you're 90% of the way thanks to > two decisions about typing: [...] I'm glad you said that. :) > I recommend that Q strings use the UTF-8 encoding internally. [...] Yup, that sounds like a good solution; no or negligible overhead for plain ASCII is a must. I think that's also the way that Python and Tcl do it. > There will be five places where Unicode has to be addressed: [...] Well, we'll also have to check the library modules. I guess that clib will be affected, as well as the GUI interface and some parts of the graphics modules. > I understand. I hope the above is somewhat helpful; I'll be happy to > answer questions either on this list or privately. Thanks for the offer, I'll get back to you on that. Unfortunately, right now I still have some new modules releases on my TODO list and then the 6.1 version of the interpreter, while my intern is busy developing an OpenAL module. So this may take some time. :( Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: John C. <jc...@re...> - 2005-04-26 16:19:15
|
Albert Graef scripsit: > >I recommend that Q strings use the UTF-8 encoding internally. [...] > > Yup, that sounds like a good solution; no or negligible overhead for > plain ASCII is a must. I think that's also the way that Python and Tcl > do it. Tcl does, Python does not. C-Python has distinct 8-bit and Unicode strings as a result of legacy considerations. Jython has only Unicode strings as a result of being embedded in Java, which also has only Unicode strings (and octet sequences as a separate type). In both cases, the Unicode strings are implemented as sequences of 16-bit integers. The TCL documentation at http://www.tcl.tk/doc/howto/i18n.html gives a nice overview of the issues in converting to a UTF-8 internal representation. > Thanks for the offer, I'll get back to you on that. Unfortunately, right > now I still have some new modules releases on my TODO list and then the > 6.1 version of the interpreter, while my intern is busy developing an > OpenAL module. So this may take some time. :( Sure. -- John Cowan www.ccil.org/~cowan www.reutershealth.com jc...@re... There are books that are at once excellent and boring. Those that at once leap to the mind are Thoreau's Walden, Emerson's Essays, George Eliot's Adam Bede, and Landor's Dialogues. --Somerset Maugham |
From: Albert G. <Dr....@t-...> - 2005-07-23 13:57:24
|
Ok, I finally decided that it's time to do something about unicode support, so I read the relevant docs (thanks, John, for your pointers, they were quite useful). John Cowan wrote: > There will be five places where Unicode has to be addressed: in pulling > substrings out of strings, in reading, in writing, in converting from > strings to byte strings, and in converting from byte strings to strings. > In the last two cases, it is desirable (but not necessary) to provide > a method of overriding the system standard external encoding such as > Latin-1 which is generated or interpreted respectively. AFAICS, here's what would be needed to get at least halfway-decent support for unicode/utf-8 in Q: - Add UTF-8/multibyte character support to the interpreter. This affects, in particular, runtime string typing (since Char objects might consist of more than one byte) and marshalling (printing string objects in the interpreter), as well as the builtins (#), (!), sub, substr, pos, ord, char, succ, pred, enum. - Fix the standard library functions chars and split, as well as the isxxx character predicates and toupper/tolower in clib. - Add the usual localization stuff to clib: setlocale/localeconv/nl_langinfo, strfmon, strcoll/strxfrm, iconv, gettext and friends. That should be all that is needed to have unicode just working when running on a system which has UTF-8 as the default encoding. But, as John pointed out, on systems using a different encoding there is still the issue of converting strings passed to or obtained from the system (including string constants in the source script and on the command line). There are basically two ways to deal with this: (1) Add automatic conversions to all operations which read/write strings from/to the system and byte string data. This is what John proposed, and is certainly the most convenient for the programmer. But is this really desirable? Doing the conversions automatically means that you always have to pay for it, even if you use fget to slurp in big 7 bit ascii files. And if you read or write a file which happens to be in an encoding different from the system encoding then the builtin conversion would garble the string data. (2) Leave it up to the programmer. The interpreter just assumes that all string data already is UTF-8 encoded and by itself doesn't touch the string data it reads/writes. When dealing with text data in other encodings, the programmer would have to use clib::iconv to do the conversion explicitly. I'm actually leaning towards solution (2) since it gives greater freedom to the programmer and avoids the conversion overhead when it's not needed. Also, it is more in line with the current implementation which doesn't mangle string data behind the scenes either. And it has the added benefit that scripts containing string constants with extended characters would be portable across platforms (which is not the case if you implicitly assume the system encoding, so that scripts always have to be written in the local encoding). Opinions? Cheers, Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: John.Cowan <jc...@re...> - 2005-07-24 07:45:21
|
Albert Graef scripsit: > - Fix the standard library functions chars and split, as well as the=20 > isxxx character predicates and toupper/tolower in clib. It's important to be aware that the POSIX model of the isxxx predicates is inadequate for Unicode. Details on request. In addition, toupper and tolower have to operate at the string level, not merely the character level: they are not mere mappings in Unicode ("Ma=DFe" -> "MASSE", for one example). > (2) Leave it up to the programmer. The interpreter just assumes that al= l=20 > string data already is UTF-8 encoded and by itself doesn't touch the=20 > string data it reads/writes. When dealing with text data in other=20 > encodings, the programmer would have to use clib::iconv to do the=20 > conversion explicitly. The difficulty with this scheme is that you have to make the string-level operators work correctly, for some sense of "correctly", even with arbitrarily malformed UTF-8. If you do the mapping (which is really quite cheap if amortized across a large object) you can guarantee that you always have well-formed UTF-8 in the internals. --=20 What asininity could I have uttered John Cowan <jcowan@reutershealth.= com> that they applaud me thus? http://www.reutershealth.com --Phocion, Greek orator http://www.ccil.org/~cowan |
From: Albert G. <Dr....@t-...> - 2005-07-24 13:46:39
|
John.Cowan wrote: > It's important to be aware that the POSIX model of the isxxx predicates > is inadequate for Unicode. Details on request. I'm interested in the details, could you please elaborate? > In addition, toupper and tolower have to operate at the string level, > not merely the character level: they are not mere mappings in Unicode > ("Maße" -> "MASSE", for one example). That's no problem, as in Q these function already work on strings of arbitrary lengths anyway. > The difficulty with this scheme is that you have to make the string-level > operators work correctly, for some sense of "correctly", even with > arbitrarily malformed UTF-8. Yes, that's ugly. The more I think about this the more I abhor the idea to load the language itself with such complexities. Maybe it's better to keep the language encoding-agnostic and push all unicode handling into the library (just the way that it is done in C/C++, as opposed to Java/Tcl). To these ends, clib would provide its own set of primitive operations (say, u8length, u8chars, u8sub, etc.) for handling proper utf-8 encoded strings. Then we could add a standard library module unicode.q on top of that, with types UFile and UString and the corresponding operations, which would handle the necessary conversions automatically and transparently, as you suggested. The only actual change to the language itself would then be new escape sequences (\uXXXX) as shortcuts for utf-8 multibyte chars, and maybe a few related fixes in the string printing routine of the interpreter. Of course you could also use utf-8 encoded string literals in a source script (you already can, but they will print correctly only on a system which uses utf-8 as its native encoding). What do you all think about this? I think that this might be the cleanest (if not most convenient) solution, also from a backward compatibility POV. Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr....@t-..., ag...@mu... WWW: http://www.musikwissenschaft.uni-mainz.de/~ag |
From: John.Cowan <jc...@re...> - 2005-07-24 23:43:30
|
Albert Graef scripsit: > >It's important to be aware that the POSIX model of the isxxx predicates > >is inadequate for Unicode. Details on request. > > I'm interested in the details, could you please elaborate? Quoting myself from the Unicode FAQ: POSIX "ctype.h" knows but two cases, whereas Unicode knows three. In POSIX, only European Arabic digits can pass "isdigit", whereas Unicode has many sets of digits, all putatively equal. In POSIX "ctype.h", that which is "alnum" but not "alpha" must be a "digit", but Unicode is aware that not all numbers are digits, nor are all letters alphabetic. Unicode groks spacing and non-spacing marks, but POSIX comprehends them not. IMHO the most important Unicode character categories are those lumped as the General Category, which divides the entire codepoint space into 30 categories, themselves grouped into 7 supercategories (letter, number, mark, punctuation, symbol, whitespace, other). Relevant Unicode transformations are uppercasing, lowercasing, titlecasing, and case folding, plus the four Unicode normalizations: decomposed, composed, compatibility decomposed, and compatibility composed. The numeric value of Unicode characters that are numbers is also significant. See http://www.unicode.org/versions/Unicode4.0.0/ch04.pdf for information. The ICU library (http://icu.sf.net) is the gold standard C/C++ implementation library for everything Unicode, and I recommend it. It's big, but it's modularizable. > Yes, that's ugly. The more I think about this the more I abhor the idea > to load the language itself with such complexities. Maybe it's better to > keep the language encoding-agnostic and push all unicode handling into > the library (just the way that it is done in C/C++, as opposed to Java/Tcl). That's because C and C++ are stuck with the "character = byte" assumption and have to build higher-level strings (unsigned short or long arrays, typically). You already have separate notions of "[character] string" and "byte string". So use byte strings for applications where you don't care what the encoding is, and regular strings where you do. > To these ends, clib would provide its own set of primitive operations > (say, u8length, u8chars, u8sub, etc.) for handling proper utf-8 encoded > strings. Then we could add a standard library module unicode.q on top of > that, with types UFile and UString and the corresponding operations, > which would handle the necessary conversions automatically and > transparently, as you suggested. I think people who pick up a new programming language expect it to handle Unicode as the native type nowadays. You have the opportunity to switch your basic string type to Unicode without getting huge complaints about backward compatibility. I urge you to take it. What do others think? -- And through this revolting graveyard of the universe the muffled, maddening beating of drums, and thin, monotonous whine of blasphemous flutes from inconceivable, unlighted chambers beyond Time; the detestable pounding and piping whereunto dance slowly, awkwardly, and absurdly the gigantic tenebrous ultimate gods -- the blind, voiceless, mindless gargoyles whose soul is Nyarlathotep. (Lovecraft) John Cowan|jc...@re...|ccil.org/~cowan |