Thread: [MOO-Discuss] Non-Printable Characters in Strings
Status: Planning
Brought to you by:
luke-jr
From: Luke-Jr <lu...@us...> - 2004-01-28 01:02:34
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 How should non-printable characters be represented in strings? Is C-style escaping of common characters (\n, \r, \t, \v, \x##, etc) a good= =20 idea? Including C-style's \### which uses octal or should \### use decimal= =20 (or hex?)? Should the following string be permitted (it is currently not)? "this is a \ multiline string" or require the programmer to use \n? Would it be a good or bad idea to parse \^A through \^Z into character 0x01= to=20 0x1A? What about parsing \e to 0x1B and \s1 to \s4 and \sf \sg \sr \su=20 (seperator characters) to 0x1C to 0x1F? Anything else? Comments please. =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAFweNZl/BHdU+lYMRAt1XAKCWyf/N5U/tro37qvmRgXJPTjFgGQCgmK9y =46cTTf+zmtu8A4iOs1CmOX/o=3D =3DcxE5 =2D----END PGP SIGNATURE----- |
From: Luke-Jr <lu...@ar...> - 2004-02-08 06:19:43
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 28 January 2004 12:51 am, Luke-Jr wrote: > Is C-style escaping of common characters (\n, \r, \t, \v, \x##, etc) a go= od > idea? Including C-style's \### which uses octal or should \### use decimal > (or hex?)? Also, if number-based escaping is used, should it be octet-based or=20 character-based? Character-based would translate the value to UTF-8 and could accept a=20 variable-length number up to a 16-bit value. For example, \xFFFF, \d65535,= =20 \o200000, \xFF, \xF \x (for a null character). (Using variable-length numbe= rs=20 would, of course, be discouraged since it could easilly create bugs if the= =20 character following the escaped one was a valid digit). =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAJdTnZl/BHdU+lYMRAhehAJ9ACWql1esncEupVjXjtCrU+rInbgCffIKu 8utozoE08zAegJuBJifYvJc=3D =3D3JQ/ =2D----END PGP SIGNATURE----- |
From: Gavin L. <md...@mi...> - 2004-02-09 20:02:08
|
At 00:51 28/01/2004 +0000, Luke-Jr wrote: > >How should non-printable characters be represented in strings? >Is C-style escaping of common characters (\n, \r, \t, \v, \x##, >etc) a good idea? Including C-style's \### which uses octal or >should \### use decimal (or hex?)? I've always regarded C-style escaping as a little messy... but then, that's mostly because I need to use the backslash a lot for paths and the like. One possible alternative (though I don't really know if it's any better) is to use Delphi-style escaping, a la: "This is a string with"#10"a newline in the middle." "This is a string with "" a double-quote in the middle." "This is a string that " + "spans two source lines." Of course, the latter example is already valid in MOO code, but of course the internal compiler/decompiler will merge it back onto one source line. -- Gavin Lambert, Mirality Systems <http://www.mirality.co.nz/> ---- Pardon me, waiter. I like my water diluted. |
From: Luke-Jr <lu...@ar...> - 2004-02-09 22:45:01
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 09 February 2004 08:01 pm, Gavin Lambert wrote: > At 00:51 28/01/2004 +0000, Luke-Jr wrote: > >How should non-printable characters be represented in strings? > >Is C-style escaping of common characters (\n, \r, \t, \v, \x##, > >etc) a good idea? Including C-style's \### which uses octal or > >should \### use decimal (or hex?)? > > I've always regarded C-style escaping as a little messy... but > then, that's mostly because I need to use the backslash a lot for > paths and the like. That's probably partially due to your operating system not using standard p= ath=20 syntax. Most systems would use / as a directory seperator which doesn't hav= e=20 this problem. Further, since MOO already uses the \ character for escaping,= I=20 wouldn't suggest breaking it. I was referring to the sequences following th= e=20 \ character such as 'n', 'r', 't', 'x__', etc... > > One possible alternative (though I don't really know if it's any > better) is to use Delphi-style escaping, a la: > "This is a string with"#10"a newline in the middle." > "This is a string with "" a double-quote in the middle." > "This is a string that " + > "spans two source lines." With regard to the \<newline> part, I probably wasn't clear... I was thinki= ng=20 using \<newline> as a way to insert a *literal* newline into the string, no= t=20 as a way to continue the string on another line. Such a possiblity is an id= ea=20 worth considering, though. > > Of course, the latter example is already valid in MOO code, but of > course the internal compiler/decompiler will merge it back onto > one source line. I don't bother going to extreme trouble to keep compile/decompile formattin= g=20 the same... Using my patch which renders 0xFF as an integer, for example, t= he=20 decompiler will still output 255. To render it as 0xFF would probably requi= re=20 adding a flag on Vars of TYPE_INT or a new TYPE_INTBASE (which has the flag= =20 instead, but is treated as INT within the MOO), but both cases would either= =20 use a lot of RAM and/or a lot of extentions to existing code when it's=20 probably not worth it. More important would be to preserve /* this style */= =20 of comments which the MOO currently forgets completely, I think. =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAKA1VZl/BHdU+lYMRAm91AJ9Mh1GOF44SMAHBj5RKQglYHualDwCeKeqB swhLjeVp7XAHPM/pYcPmOPc=3D =3DT+Qt =2D----END PGP SIGNATURE----- |
From: Luke-Jr <lu...@ar...> - 2004-02-10 02:38:58
Attachments:
unprintable-strings-proposal-01.patch
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Attached is a patch for a vanilla LambdaMOO server to add support for C-sty= le=20 escape codes, \### (octal), \^@ through \^_ (see ASCII chart for details on= =20 range), \e, \s1 to \s4, \sf, \sg, \sr, and \su. Octet-based, but VBR length= =20 numbers for \x##, \o###, and \d### are supported. This patch does not address \<newline>, and such a sequence will continue t= o=20 error. It has the potential for memory leaking since character 0x00 can be= =20 used and the LambdaMOO server cannot properly handle it in strings (yet). \### can easilly be changed to use decimal, but unless there are any=20 objections, I think octal is more logical. Note that hex is not an option,= =20 since it would conflict with things such as \a or \b, which are C-style=20 escape codes. As before, comments are more than welcome. :) On Wednesday 28 January 2004 12:51 am, Luke-Jr wrote: > Is C-style escaping of common characters (\n, \r, \t, \v, \x##, etc) a go= od > idea? Including C-style's \### which uses octal or should \### use decimal > (or hex?)? > Should the following string be permitted (it is currently not)? "this is a > \ multiline string" or require the programmer to use \n? > Would it be a good or bad idea to parse \^A through \^Z into character 0x= 01 > to 0x1A? What about parsing \e to 0x1B and \s1 to \s4 and \sf \sg \sr \su > (seperator characters) to 0x1C to 0x1F? =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAKEQsZl/BHdU+lYMRAuVRAJ4yHURT+Dz5HCwffbuc5jUS4vay6wCdHXLr Y7dP1hFXts9RBbYJwaTDWYY=3D =3DyNH5 =2D----END PGP SIGNATURE----- |