SourceForge has been redesigned. Learn more.

## tack-devel — Discussion for developers of the ACK.

You can subscribe to this list here.

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Jan Feb Mar Apr May (1) Jun (4) Jul (4) Aug (6) Sep (1) Oct Nov Dec Jan (10) Feb (5) Mar Apr (1) May Jun Jul (88) Aug (15) Sep Oct (1) Nov (2) Dec (1) Jan Feb (8) Mar (4) Apr May (32) Jun (7) Jul Aug (2) Sep (2) Oct (1) Nov Dec Jan Feb Mar (3) Apr (2) May Jun (2) Jul Aug Sep (5) Oct Nov Dec (2) Jan Feb (1) Mar (1) Apr (3) May (1) Jun (5) Jul (1) Aug Sep (1) Oct Nov (9) Dec (2) Jan Feb Mar Apr May Jun Jul Aug (12) Sep (13) Oct (2) Nov Dec Jan Feb (2) Mar (2) Apr (2) May (11) Jun (7) Jul (2) Aug (3) Sep (1) Oct (2) Nov Dec Jan Feb (9) Mar (7) Apr May Jun Jul Aug Sep (8) Oct (2) Nov Dec (2) Jan Feb Mar (7) Apr (8) May (23) Jun (4) Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May (1) Jun (2) Jul (1) Aug Sep (13) Oct (1) Nov (3) Dec (1) Jan (1) Feb Mar (1) Apr May Jun (3) Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr (10) May (11) Jun (7) Jul (2) Aug Sep (6) Oct (21) Nov (19) Dec (3) Jan (15) Feb (3) Mar Apr (3) May (2) Jun (1) Jul Aug Sep Oct Nov (1) Dec

Showing results of 457

1 2 3 .. 19 > >> (Page 1 of 19)
 [Tack-devel] random numbers and time zones From: George Koehler - 2017-11-07 05:11:57 ```I write about random numbers in the Amsterdam Compiler Kit, then about time zones. ack comes with an example in several languages of the game, "See if you can guess my number." The Basic version in examples/hilo.bas is easy, because the computer's number is always zero, so I always guess 0 and win in 1 try. The code to pick the number is 1010 Number% = rnd(1) mod 100 ACK's EM-Basic ignores the 1 in rnd(1); Microsoft's QBasic would check if the 1 is positive. Then, in both EM-Basic and QBasic, RND should return the next random number as a float from 0.0 to 1.0. The MOD in `rnd(1) mod 100` is an integer operation, so Basic rounds the float to an integer 0 or 1. Then the result of 0 mod 100 or 1 mod 100 is 0 or 1. The game should pick 0 or 1. EM-Basic calls C's rand(), but uses the wrong maximum for rand() on platforms with 4-byte int. This causes RND to always pick numbers close to 0. Then the game always rounds the numbers to 0. There are two random number generators in ack. C's rand() uses the linear congruential generator next = next * 1103515245 + 12345; rand() discards the lowest 16 bits, and returns the next 15 bits of that value. Modula-2's random uses an additive generator x[k] = x[k - 55] + x[k - 24] but x is a ring buffer of the last 55 values. Both are from the 1980s. I might try to replace them with a more recent generator. Two recent algorithms are xoroshiro128+ from http://xoroshiro.di.unimi.it/ PCG from http://www.pcg-random.org/ I didn't like xoroshiro128+ because it uses 64-bit integers, and ack lacks 64-bit operations. It only uses bit operations (xor, shift, rotate) and one can easily split them into 32-bit operations, but I didn't want to try. I didn't like PCG because it uses multiplication, which is slow in processors without a multiply instruction, like the i80. I am now looking at sfc32 from http://pracrand.sourceforge.net/ sfc32 uses 32-bit integers with bit operations and addition. Modula-2 seems to have no bit operations on integers (both shifts and bitwise logic), but I might write the code in EM. sfc32 has a rotate (barrel shift) and EM code can rotate with ROL or ROR. C code would split the rotate into shifts and bit-or, and ack can't optimize it to ROL or ROR. Back end for PowerPC can't compile ROL or ROR, but I might teach it. QBasic by tradition uses randomize timer to seed the generator. EM-Basic has randomize but is missing timer, to return a float counting seconds since midnight. One would need to know the time zone to calculate midnight. That's a problem because ack's libc can't find the time zone. It wants to call ftime() to get the time zone from the kernel, but that technique is obsolete. If I try to call ctime(), the compile fails as the linker can't find ftime(). The modern way is to open /etc/localtime and read tzfile(5) for time zone, but ack's libc hasn't learned to do that. Also, libc has an outdated default rule for summer time. It doesn't know that the European Union changed the rule in 1996. --George Koehler ```
 Re: [Tack-devel] a testcase which does not pass in my ack From: David Given - 2017-05-04 22:34:27 Attachments: signature.asc ```On 27/04/17 14:51, u-rkkt@... wrote: [...] > push ebp > mov ebp,esp > push esi > mov esi,1234 > push 1 > call _xx > pop ecx > pop ecx > pop esi > leave > ret [...] > It is "... ! , ..." being passed as arguments which makes > the compiler push an insufficient number of items and also generate > an unsuitable constant there. A presence of an extra argument (not the > same ) _between_ "! " and "" prevents the insanity. Okay, so I've had a chance to look at this... (a) it definitely looks like a bug. (I can duplicate it on my modern ACK, for all architectures, not just i386.) (b) it looks like a really, really old bug. (I can duplicate this on Minix 1.7, released 2005.) This appears to be an optimiser bug. If you compile with -t and compare the .k file (before optimisation) with the .m file (after optimisation), you get this: Before: loc 1234 stl -2 lol -2 lol -2 loc 0 cmi 2 tne teq cal \$xx After: loc 1234 stl -2 loc 1 cal \$xx The last is definitely wrong. My feeling is that what's happening here is something terrible is happening in the constant propagation code. But it obviously can't be happening very *often*, because otherwise someone would have noticed in the past decade or so... I've had a quick look through the optimiser patterns in util/opt/patterns, but don't see anything immediately obvious. It'll need more investigation. -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 [Tack-devel] a testcase which does not pass in my ack From: - 2017-04-27 13:08:09 ```Hello, I am running natively on i386 Linux ackpack ported from Minix-2, which is more or less identical to the i386 part of the "original" ack. Now I noticed a case of odd behaviour: \$ cat >test.c <, ..." being passed as arguments which makes the compiler push an insufficient number of items and also generate an unsuitable constant there. A presence of an extra argument (not the same ) _between_ "! " and "" prevents the insanity. David, I am curious what your ack does when presented with such input? Do you possibly have a clue of how to fix the problem? Regards, Rune ```
 [Tack-devel] You are one of the few who gets in on this! From: Swiss Millionaire - 2017-04-03 18:22:14 Attachments: Message as HTML ``` You are receiving this email from Swiss Millionaire because you signed up to be on our list. If you wish not to receive further emails, please unsubscribe below. Read in browser http://secure.luxusarctica.com/index.php/campaigns/tz330oqvyt1da/track-url/yv169l9bqnb36/18a32c14c574366e63d2e76c398c809d7192a9bf YOU DON'T WANT TO MISS THIS OPPORTUNITY This new guaranteed money system is going to go fast, so I'll get straight to the point. It’s BRAND NEW, just about the most simple system created and brings in over \$7,180 per day! Finally the payday you and all of us deserve! So check this out: A video just released to youtube that you're going to want to see. Here's the link: CHECK IT OUT http://secure.luxusarctica.com/index.php/campaigns/tz330oqvyt1da/track-url/yv169l9bqnb36/7d97cac745295ac2019703749e851ffa75d09b0d This method, called the "Swiss Millionaire System" doesn't require you to start with: Money - Hard work - Business Experience - Any particular skills. And what's so incredibly wonderful about Swiss Millionaire System is that you only have to be active for 5 minutes a day; it can run pretty much entirely on auto pilot with GUARANTEED profits. We must be insane - we'll be sharing with you exactly how you can make over \$230,000 per month using this exact method. Check out the video now before we change our minds! This strategy proved to be an overnight success for us and we're sure it will be for you too! Watch the video now before the server gets bogged down with others wanting in! Don’t miss out there are only a few spots left. Do you really want to miss out on profiting over \$7,180 per day? If you leave now, you’ll regret it when you come back to try this out and you see the page cannot be found. It’s FREE today. GET YOUR FREE ACCESS TODAY! http://secure.luxusarctica.com/index.php/campaigns/tz330oqvyt1da/track-url/yv169l9bqnb36/7d97cac745295ac2019703749e851ffa75d09b0d Copyright © 2017 Swiss Millionaire, All rights reserved Unsubscribe http://secure.luxusarctica.com/index.php/campaigns/tz330oqvyt1da/track-url/yv169l9bqnb36/ab97c18f540bb64fd84e7d66f824557691f7c778 ```
 [Tack-devel] incomplete changes to PowerPC ncg, ego From: George Koehler - 2017-02-20 23:03:18 ```My branch https://github.com/kernigh/ack/tree/kernigh-linuxppc has some incomplete changes to PowerPC ncg and ego. No pull request, because my branch has at least one problem with 4-byte floats. I will not be active for the next several days, so my branch will remain as is. We have two PowerPC back ends, ncg and mcg. The new code generator (ncg) is really the old one. PowerPC ncg existed by 2007, got some important fixes in September 2016, and can now compile some but not all C and Modula-2 programs. David Given's modern code generator (mcg) existed by October 2016, and can now compile most programs, but often emits wrong code. When I last checked, printf() in C worked in ncg but not in mcg. In my branch, I tried to complete the old back end, PowerPC ncg. I added the missing conversions between integers and 4-byte floats. I implemented the EM instruction lxl, for nested procedures in Modula-2. I also made some changes to register allocation. Since 2007, PowerPC ncg had defined an individual register class for each of the 32 general-purpose and 32 floating-point registers. These 64 classes had names like GPR3, FPR3, GPR4, FPR4, and so on. The table used these classes to coerce values from the EM stack into specific registers. For example, the rule for EM instruction aar coerced 3 values into GPR3, GPR4, GPR5. But ncg's register allocator is too slow with so many classes. A rule using 3 GPRs would take about 2 seconds to allocate them. So in October 2016, I added REG_PAIR to speed up some rules. REG_PAIR meant to allocate a pair of GPRs from a list of only 4 pairs. In http://tack.sourceforge.net/olddocs.html, I found warnings against too many register classes. Frank Doodeman's m68020 paper said, > Since Hans van Staveren in his document [4] clearly states that *cg* execution time is negatively influenced by the number of properties, only four different properties have been defined. van Staveren's ncg paper said, > Every extra property means the register set is more unorthogonal and *cg* execution time is influenced by that, because it has to take into account a larger set of registers that are not equivalent. So try to keep the number of different register classes to a minimum. When faced with the choice between two possible code rules for a nonfrequent EM sequence, one being elegant but requiring an extra property, and the other less elegant, elegance should probably loose. In my branch, I removed 63 of the 64 individual register classes. (I left a singleton class for register r3.) I also removed REG_PAIR. Register allocation becomes much faster, because each allocation picks from only 1 or 2 classes. Compilation with ack -O1 is quick; compilation with ack -O2 or higher uses most time to run ego, the EM global optimizer. To remove the register classes, I changed libem. When rules in ncg call libem, they can no longer coerce stack values to registers (except r3). So I changed libem to pass most values on the real stack, not in registers. This is slower. (Regular calls to C functions or Modula-2 procedures continue to use the real stack, and are as slow as always.) Because of these libem changes, I needed to delete all my PowerPC .o files. My branch also made changes to register variables. These use a second method of register allocation, where the registers get preserved across function calls. The method, in ncg, simply maps EM local variables into registers. There is an RA phase in ego that rearranges the local variables so ncg can emit better code. In the default branch, PowerPC ncg has regvars only for integers, not for floats. We run the RA phase in ego. Platform osxppc runs ego with the descr file, but platform linuxppc runs ego without a descr. (When I wrote powerpc.descr, I enabled it for osxppc but forgot to enable it for other platforms.) I find that it harms code generation to run the RA phase without a descr file. Each EM local variable has a register score. Before ego runs, this score is about the number of times that the var appears in the code. If the score is bigger than about 3, then ncg would try to allocate a regvar. If ego runs the RA phase, it changes each score to 0 or 10000. The number of registers with score 10000 is never greater than the number of registers in the descr file. But if there's no descr, the phase changes all the scores to zero. When linuxppc runs ego without a descr, if we run the RA phase, we disable regvars in ncg. So we can emit better code for linuxppc by running ack -O1 or -O2, because -O3 enables the RA phase. In my branch, I tried to add floating-point regvars to PowerPC ncg. But in ncg, all float regvars must have the same size. I added only 8-byte float regvars, because 8-byte floats seem more common then 4-byte floats. I added the float regvars to ego's descr. But the RA phase assumed that all float regvars hold 4-byte floats. It changed all the scores for 8-byte floats to zero, so ncg never allocated the float regvars! I then changed ego to put both 4-byte floats and 8-byte floats in registers. My branch has a problem. When the RA phase puts a local in a register, it also frees the stack space for the local. So the RA phase can put a 4-byte float in a register and free its stack space. Then PowerPC ncg refuses to put the 4-byte float in a register (because it only has 8-byte float regvars), and ncg tries to use the stack space that ego freed. This doesn't work, so I am observing corruption of 4-byte floats in programs. I have not fixed this problem. My branch https://github.com/kernigh/ack/tree/kernigh-linuxppc will remain as is (with the 4-byte float problem) for at least the next several days, while I am not active. -George Koehler ```
 Re: [Tack-devel] change in RELOPPC with hi16, ha16, lo16 From: George Koehler - 2017-02-09 00:25:35 ```After I sent the last mail about the RELOPPC change, I decided to throw it away and do it differently. I did rewind my branch https://github.com/kernigh/ack/tree/kernigh-linuxppc to before the RELOPPC change. I then added a new relocation type RELOLIS. I transplanted my ncg changes from the old branch to the new branch. The old branch is now https://github.com/kernigh/ack/tree/zzz-old-reloppc RELOLIS handles a single instruction, a PowerPC lis using ha16 or hi16: lis RT, ha16[expr] == addis RT, r0, ha16[expr] lis RT, hi16[expr] == addis RT, r0, hi16[expr] The relocated expr is a symbol plus a signed 26-bit offset. The trick is how RELOLIS stores the offset in the program text. The lis instruction takes 32 bits. There are 6 bits for the addis opcode, 5 bits for register RT, 5 bits for register r0, and 16 bits for the immediate value. RELOLIS is only for lis instructions, so it doesn't need to store the addis opcode or register r0. This frees 11 bits. I need 5 bits to store register RT, but I have 27 other bits. I use 1 bit as a flag, set for ha16, clear for hi16. The other 26 bits are the offset. RELOLIS stores the value with the flag in the high bit, the register RT in the next 5 bits, and the offset in the low 26 bits. During relocation, the linker does symbol plus offset. The high 16 bits of the sum become the immediate value of the lis. If the ha16 flag is set, the linker does the sign adjustment. Then the linker assembles the lis instruction, filling in the addis opcode and the register r0. RELOLIS is simple because it handles a single ha16 or hi16 instruction. It doesn't need to find a second instruction with a matching lo16. -George Koehler ```
 [Tack-devel] change in RELOPPC with hi16, ha16, lo16 From: George Koehler - 2017-02-07 04:01:14 ```This mail describes a change in my branch https://github.com/kernigh/ack/tree/kernigh-linuxppc It isn't ready for a pull request. In my branch, ncg can generate code like lis r12,ha16[__II0] lis r11,ha16[_f] lfs f1,lo16[_f](r11) lfs f2,lo16[__II0](r12) fadds f13,f2,f1 stfs f13,lo16[_f](r11) Here ncg has allocated r11 for ha16[_f] and used r11 in both lfs and stfs. I changed the assembler and linker to accept this code. Before my changes, each ha16[_f] needs a matching lo16[_f] in the next instruction, and each lo16[_f] needs a previous ha16[_f]. After my changes, each ha16[_f] needs a matching lo16[_f] in the next 1 to 63 instructions, and lo16[_f] doesn't need ha16[_f]. A lonely lo16 gets a simple RELO2 relocation. I change the format of a RELOPPC relocation. Before my change, RELOPPC has a 32-bit offset. After my change, it has a signed 26-bit offset. I use the high 6 bits to encode the distance from the ha16 (or hi16) instruction to the matching lo16 instruction. Each ack.out relocation has a symbol + offset, and stores the offset inside the instructions. This is why RELOPPC must pair ha16 with lo16. Each instruction has room for only 16 bits of an offset, so RELOPPC pairs 2 instructions to get 32 bits. The above code has a gap of distance 3 from ha16[__II0] to lo16[__II0]. The gap exists because ncg emits the lis as soon as it allocates the register. When ncg loads a 4-byte value, I don't know whether to emit lwz (to a general-purpose register) or lfs (to a floating-point register). So, I yield a token. Later, ncg coerces the token to a register. This coercion emits lwz or lfs. A coercion in ncg may not allocate more than one register. The coercion that emits lfs f2,lo16[__II0](r12) may allocate f2, but not r12. So I allocate r12 earlier, in pat lae. So ncg emits lis r12,ha16[__II0] early. I changed the format of RELOPPC to allow the gap between the lis from the early allocation, and the lwz or lfs from the coercion. There is a disadvantage. The assembler fails if the offset is too big for signed 26-bit. This can happen if a module has more than 32 MB of data in a section. The program would use ha16 and lo16 to reference this data. The relocation would be offset from the beginning of the section. An offset of at least 32 MB would fail. Before my change, RELOPPC uses a signed 26-bit offset for branches. My change is to also use a signed 26-bit offset for ha16 and hi16. -George Koehler ```
 Re: [Tack-devel] weaknesses in our PowerPC assembler From: George Koehler - 2017-01-18 22:45:10 ```New version of my incomplete diff to teach the PowerPC assembler about extended mnemonics: https://gist.github.com/kernigh/4e05d2a84a34c676237a7bad6259171a The diff requires my assembler changes from https://github.com/davidgiven/ack/pull/44 I am adding simplified mnemonics for branching, subtraction, comparison, and traps, and am partway into adding them for rotation. I have not tested whether the added names assemble correctly. I stopped trying to add "eq", "gt", and such as register names, and I stopped trying to allow register names in expressions. I am only adding names for instructions. The 0 in "sc 0" and the branch hint in "bcctr" and "bclr" are optional. The cr0 in "beq cr0, label" should be optional, but I require it. To make it optional, I might use a rule in yacc like /* optional condition register, comma */ opt_cr : /* nothing */ { \$\$ = 0; } | CR ',' { \$\$ = \$1; } This rule might cause a problem later. Now, CR must be a register name. Later, we might allow that CR is a number. If so, the parser can't decide if the number is an optional CR or the required next operand. The parser is deciding whether the first operand is a CR before it knows the number of operands. I might add the optional cr0 feature. Then if anyone wants numeric CR, one must deal with this problem. Another missing feature is static branch prediction. I don't allow "beq+" or "bgt-" because the assembler doesn't take + or - in instruction names. I'm not adding this feature because I probably won't use static branch prediction in programs. -George Koehler ```
 Re: [Tack-devel] Bad symbol tables generated by aelflod From: David Given - 2017-01-17 23:09:23 Attachments: signature.asc ```On 17/01/17 23:32, George Koehler wrote: > On Tue, Jan 17, 2017 at 3:55 PM, David Given wrote: >> Do you want to send me a PR or shall I just commit this? > > Just commit it. Done --- ta. -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 Re: [Tack-devel] Bad symbol tables generated by aelflod From: George Koehler - 2017-01-17 22:32:24 ```On Tue, Jan 17, 2017 at 3:55 PM, David Given wrote: > Do you want to send me a PR or shall I just commit this? Just commit it. ```
 Re: [Tack-devel] Bad symbol tables generated by aelflod From: David Given - 2017-01-17 20:55:56 Attachments: signature.asc ```On 17/01/17 03:09, George Koehler wrote: [...] > Try the attached diff (also in > https://gist.github.com/kernigh/0521b3e18c9b14aeb4d5d47fab2cdabb). Works fine, thanks! Do you want to send me a PR or shall I just commit this? -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 Re: [Tack-devel] Bad symbol tables generated by aelflod From: George Koehler - 2017-01-17 02:09:10 Attachments: ack-elf-fix.diff ```The ELF spec at http://www.sco.com/developers/gabi/ says, "In each symbol table, all symbols with STB_LOCAL binding precede the weak and global symbols," and that sh_info is the index of the first non-local symbol. I was mixing local and global symbols and setting sh_info to zero. I also forgot to set the type of the .shstrtab section. Try the attached diff (also in https://gist.github.com/kernigh/0521b3e18c9b14aeb4d5d47fab2cdabb). ```
 Re: [Tack-devel] Bad symbol tables generated by aelflod From: George Koehler - 2017-01-16 23:40:42 ```On Mon, Jan 16, 2017 at 6:15 PM, David Given wrote: > \$ readelf --symbols > /tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin > > Symbol table '.symtab' contains 147 entries: > Num: Value Size Type Bind Vis Ndx Name > 0: 10000054 0 FUNC LOCAL DEFAULT 1 begtext > readelf: Warning: local symbol 0 found at index >= .symtab's sh_info > value of 0 Looks like I made a mistake in util/amisc/aelflod.c, where I set sh_info to zero in every section, "emit32(0); /* info */". I guess that .symtab needs a nonzero value. I don't get an error from readelf in OpenBSD 6.0 or Debian 8.5, so I don't know how to fix the error. -George Koehler ```
 [Tack-devel] Bad symbol tables generated by aelflod From: David Given - 2017-01-16 23:15:51 Attachments: signature.asc ```It looks like, at least for Linux (I haven't tried OpenBSD yet), aelflod is producing invalid symbol tables. This appears to happen for at least linuxppc and linux386: \$ objdump -d /tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin objdump: /tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin: attempt to load strings from a non-string section (number 7) objdump: /tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin: Bad value \$ gdb /tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin BFD: /tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin: attempt to load strings from a non-string section (number 7) "/tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin": not in executable format: Bad value \$ readelf --symbols /tmp/ack-build/obj/plat/linuxppc/tests/pascalsets_p_bin/pascalsets_p_bin Symbol table '.symtab' contains 147 entries: Num: Value Size Type Bind Vis Ndx Name 0: 10000054 0 FUNC LOCAL DEFAULT 1 begtext readelf: Warning: local symbol 0 found at index >= .symtab's sh_info value of 0 1: 10001714 0 OBJECT LOCAL DEFAULT 3 begdata readelf: Warning: local symbol 1 found at index >= .symtab's sh_info value of 0 2: 10001340 0 OBJECT LOCAL DEFAULT 2 begrom readelf: Warning: local symbol 2 found at index >= .symtab's sh_info value of 0 3: 100018ac 0 OBJECT LOCAL DEFAULT 4 begbss [...etc...] -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 Re: [Tack-devel] weaknesses in our PowerPC assembler From: David Given - 2017-01-15 11:05:51 Attachments: signature.asc ```On 15/01/17 03:38, George Koehler wrote: [...] > The code hasn't learned to handle low half >= 0x8000. Some > instructions, like lwz, take a signed 16-bit integer, so the low half > becomes negative, so we must add 1 to the high half as adjustment. > Other instructions, like ori, take an unsigned 16-bit integer, so we > must not add 1. Oh, d'oh, I forgot about that. Fixed. Or at least, I hope --- I'm not confident I got the sign adjustments the right way round (we have no tests for this yet). I've also changed the syntax to more match Apple's, as their syntax seems closer to the ACK's, and .powerpcfixup is no more, as I have realised that we can rely on hi() or ha() to generate the relocation for us. So it's now just: addis r3, r0, hi16[sym] ori r3, r3, lo16[sym] vs addis r3, r0, ha16[sym] /* note ha */ lwz r3, lo16[sym] (r3) It's important to note that lo16 doesn't generate a relocation, so if the instruction after the hi16 doesn't have one, then you'll get a linker failure at best and gibberish at worst. The diff's now: https://github.com/davidgiven/ack/compare/dtrg-fixups?expand=1&w=1 -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 Re: [Tack-devel] weaknesses in our PowerPC assembler From: George Koehler - 2017-01-15 02:38:22 ```On Sat, Jan 14, 2017 at 6:26 PM, David Given wrote: The syntax is pretty ugly, and ended up looking like this: > > .powerpcfixup sym > addis r3, r0, hi sym > lwz r3, [lo sym] (r3) For comparison, GNU syntax (for Linux or BSD) would be addis %r3, %r0, sym@... lwz %r3, sym@...(%r3) Apple syntax (for Mac OS X) would be addis r3, r0, ha16(sym) lwz r3, lo16(sym)(r3) > https://github.com/davidgiven/ack/commit/8edbff9795ef559eaefefefdf1760f57b3a926c8?w=1 The code hasn't learned to handle low half >= 0x8000. Some instructions, like lwz, take a signed 16-bit integer, so the low half becomes negative, so we must add 1 to the high half as adjustment. Other instructions, like ori, take an unsigned 16-bit integer, so we must not add 1. If low half >= 0x8000, then the linker needs to check the second instruction, and decide whether or to add 1 to the high half. The assembler needs syntax to decide whether to add 1. GNU uses sym@... or sym@..., Apple uses ha16(sym) or hi16(sym). I like GNU syntax, but our assembler might not like that @ sign. I did not build or run your dtrg-fixups branch, because I have not finished my current project to teach some extended mnemonics to the assembler. -George Koehler ```
 Re: [Tack-devel] weaknesses in our PowerPC assembler From: David Given - 2017-01-14 23:26:49 Attachments: signature.asc ```On 14/01/17 03:05, George Koehler wrote: [...] > I also can't require that the addend is zero. If our symbol isn't > global, the relocation must refer to a section symbol. This is because > led can only find global symbols or section symbols. So instead of our > symbol plus zero, we have a section symbol plus a probably non-zero > addend. I was intending that the .powerpcfixup pseudo emits a relocation that uses the next *two* instructions for the offset, so there's still 32 bits available. The only difference between this and what we're currently doing with li32 is that it allows you to use any two instructions (whether they make sense or not...). I've actually implemented this, and converted a couple of ncg instructions to use it; I can't comment on correctness, but the tests pass. The syntax is pretty ugly, and ended up looking like this: .powerpcfixup sym addis r3, r0, hi sym lwz r3, [lo sym] (r3) The second two instructions don't generate relocations at all, and rely on the .powerpcfixup. It's in the dtrg-fixups --- PTAL. https://github.com/davidgiven/ack/commit/8edbff9795ef559eaefefefdf1760f57b3a926c8?w=1 I haven't figured out how to extract the raw symbol/offset information from ncg, so if you refer to a symbol with an offset you get this: .powerpcfixup sym+8 addis r3, r0, hi sym+8 lwz r3, [lo sym+8] (r3) .powerpcfixup ignores the offset part, and hi and lo ignore the symbol part, so it all works out. But it still looks terrible. However, fixing that requires understanding ncg and ncgg, and understanding the assembler is bad enough. -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 Re: [Tack-devel] weaknesses in our PowerPC assembler From: George Koehler - 2017-01-14 02:05:49 ```On Sun, Jan 8, 2017 at 7:08 AM, David Given wrote: > Suggestion: we could add a .powerpc_relocation pseudoinstruction which > generates a RELOPPC relocation, but doesn't emit any code. That would > allow this: > > .powerpc_relocation some_symbol > lis r3, 0 > lwz r3, 0(r3) This wouldn't work, I guess, because of how ack.out(5) format stores the relocation info. It packs the addend in the instruction. So we can't have a RELOPPC unless we also emit the code with the addend. I also can't require that the addend is zero. If our symbol isn't global, the relocation must refer to a section symbol. This is because led can only find global symbols or section symbols. So instead of our symbol plus zero, we have a section symbol plus a probably non-zero addend. There isn't enough room for the 32-bit addend in the 16-bit field of an lis instruction. This is why RELOPPC needs a pair of instructions like lis, ori. This is different from ELF format used by gcc for Linux. ELF has a 32-bit addend in the relocation struct, so ELF doesn't pack the addend in the instruction. An ELF relocation for lis would have type R_PPC_ADDR16_HI or type R_PPC_ADDR16_HA. Our assembler would need to emit the 32-bit addend in the pair of instructions. Perhaps like this: lis r3, sym@... lwz r3, sym@...(r3) For lis, the assembler would create the RELOPPC and emit the high half of the addend. Then it would remember this lis in some global variable. For lwz, it would check the remembered lis and emit the low half of the addend. -George Koehler ```
 Re: [Tack-devel] New B compiler frontend From: George Koehler - 2017-01-13 23:25:45 ```The new B compiler seems to have exposed a bug in ncg for linux68k. See my report in https://github.com/davidgiven/ack/issues/36 B seems to be a smaller C. No types in B. Everything is an int, which is the same size as a pointer. No for (...) loop. Return value must have parentheses, as in return (0); Assignment operators are backwards; i += 2 in C is i =+ 2 in B. Tiny standard library. I wrote this program to print the word size: main() { auto i, w; i = 1; w = 0; while (i) { i =<< 8; w++; } printf("word size = %d*n", w); return (0); } A character literal like 'ABCD' can have multiple characters, up to the word size. The first word of string "ABCD" is 'DCBA' with the big-endian PowerPC. This is done so a plain 'A' is simply 65, the ASCII value of A, as I expect. By default, a program can't have more than one .b file. The compiler makes a procedure "bmodule_main" for each .b file, and a program can't have more than one such procedure. The ack(1) manual describes a way to link multiple .b files, but that way is a mess. It needs to be simplified, somehow. Pointers in B count words, not bytes. The EM code from the compiler always does a left shift before dereferencing these pointers. All function calls are indirect through left-shifted pointers. Unfortunately, each function must align its code to the beginning of a word. Commit 73922f1 changed the back ends for i86 and i386 to align all procedures to 2 bytes for i86 or 4 bytes for i386. This happens even in programs that don't use B. So the alignment has been increased only to satisfy the B compiler. -George Koehler On Sun, Jan 8, 2017 at 6:31 AM, David Given wrote: > Merry Christmas, and have a new compiler: I took some time off from > thinking about register allocation (ugh) and ported the ABC B compiler > to the ACK. It's now integrated into the system and everything. > > B is Ken Thompson and Dennis Ritchie's untyped programming language > which later acquired types and turned into K&R C. Everything's a machine > word, and pointers are *word* address, not byte addresses. > > The port's a bit clunky and doesn't generate good code, but it works and > it passes its own tests. It runs on all supported backends. There's not > much standard library, though. > > Example: > > https://github.com/davidgiven/ack/blob/default/examples/hilo.b > > (Also, in the process it found lots of bugs in the PowerPC mcg backend, > now fixed, as well as several subtle bugs in the PowerPC ncg backend; so > that's good. I'm pretty sure that this is the only B compiler for the > PowerPC in existence.) > > -- > ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── > │ "There is nothing in the world so dangerous --- and I mean *nothing* > │ --- as a children's story that happens to be true." --- Master Li Kao, > │ _The Bridge of Birds_ > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > Tack-devel mailing list > Tack-devel@... > https://lists.sourceforge.net/lists/listinfo/tack-devel > ```
 Re: [Tack-devel] weaknesses in our PowerPC assembler From: David Given - 2017-01-08 12:08:39 Attachments: signature.asc ```On 05/01/17 06:28, George Koehler wrote: [...] > I have an incomplete diff to add some extended mnemonics: > https://gist.github.com/kernigh/4e05d2a84a34c676237a7bad6259171a Excellent --- thank you! I've just been doing some work on the PowerPC backends and not having the extended menmonics is a total pain. Yes, the assembler's rather primitive. I don't know why you can't use pointer differences as rvalues; I don't think there's any technical reason why it couldn't, as it's a three-pass assembler, but it doesn't. It may just be a matter of using an expr rather than an absexp in the grammar, but there could be more to it than that. The code is pretty cryptic and I've avoided looking at it too hard. Regarding adding names for condition registers: I think that's an excellent idea, but I'm not sure about adding a new type for them. Most PowerPC assemblers treat everything, including register names, as raw integers, and then add numeric identifiers for them. This would allow using them in arbitrary expressions --- e.g. I wasted at least an hour the other day trying to extract condition code bits from cr0 using mfcr/rlwinm; being able to use [1 + cr0 + eq] as the left shift would be really convenient. I believe this should just be a matter of adding lines like this to the mach3.c symbol table: 0, NUMBER, 2, "eq", 0, NUMBER, 5, "r5", Of course, that way there's no type safety, but it is easy and makes the ACK assembler more like others. > The last form causes a conflict in yacc. The assembler must continue > to accept numbers. If it also accepts names, then a 4 * ... can either > be a number or a name. The parser can use only two tokens to make the > decision, so a 4 * ... would be ambiguous. The assembler has a hard time accepting raw expressions --- they need to be bracketed (with square brackets). e.g. mcg generates this for right shifts: rlwinm r3, r3, [32-5], 31, 31 [...] > Our linker has a weakness. Suppose that I want to load a 4-byte word > from address 0x1234abcd. I would normally do > > lis r3, 0x1235 > lwz r3, 0xabcd(r3) > > I split the address into two 16-bit values in a pair of instructions. > The first sets r3 = 0x12350000 and the second loads the word at offset > 0xabcd from there. I adjusted 0x1234 to 0x1235 because 0xabcd becomes > negative. Yes, the way this is currently done is really ugly (and inefficient). Suggestion: we could add a .powerpc_relocation pseudoinstruction which generates a RELOPPC relocation, but doesn't emit any code. That would allow this: .powerpc_relocation some_symbol lis r3, 0 lwz r3, 0(r3) Now we can use any combination of instructions with 16-bit payloads immediately following. The linker, in get_power_valu() and put_powerpc_valu(), is still reading and writing the payload of both instructions in a single operation, so overflow isn't a problem any more. The linker still needs to be able to distinguish branch instructions from 16+16 address instructions, but that's straightforward. It's not ideal, but it does give us a general solution producing efficient code, and requires a minimum of assembler hacking. -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 [Tack-devel] New B compiler frontend From: David Given - 2017-01-08 11:32:04 Attachments: signature.asc ```Merry Christmas, and have a new compiler: I took some time off from thinking about register allocation (ugh) and ported the ABC B compiler to the ACK. It's now integrated into the system and everything. B is Ken Thompson and Dennis Ritchie's untyped programming language which later acquired types and turned into K&R C. Everything's a machine word, and pointers are *word* address, not byte addresses. The port's a bit clunky and doesn't generate good code, but it works and it passes its own tests. It runs on all supported backends. There's not much standard library, though. Example: https://github.com/davidgiven/ack/blob/default/examples/hilo.b (Also, in the process it found lots of bugs in the PowerPC mcg backend, now fixed, as well as several subtle bugs in the PowerPC ncg backend; so that's good. I'm pretty sure that this is the only B compiler for the PowerPC in existence.) -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
 [Tack-devel] weaknesses in our PowerPC assembler From: George Koehler - 2017-01-05 05:28:09 ```ack in git (default branch from https://github.com/davidgiven/ack) can now compile simple programs to PowerPC for both Linux (ack -mlinuxppc) and Mac OS X (ack -mosxppc). There remain several bugs and missing features. I can't run ack on OS X, but I can cross-compile to OS X from OpenBSD. ack uses its own assembler and linker. It has some weaknesses with PowerPC code. I have an incomplete diff to add some extended mnemonics: https://gist.github.com/kernigh/4e05d2a84a34c676237a7bad6259171a Extended mnemonics, or simplified mnemonics, are simpler ways to write some PowerPC instructions. For example, the extended "blr" (branch to link register) is the same instruction as "bclr 20, 0, 0" (branch conditional to link register, 20 to branch always, condition 0, branch hint 0). Our assembler is now missing most of the extended mnemonics from IBM's and Freescale's docs. So when I want to return from a function, I can't write "blr", I must write "bclr 20, 0, 0". My diff adds "blr" and some others, but I have not checked if they assemble correctly. My diff also tries to add names for bits in the condition registers. Our assembler usually requires register names. For example, it takes "addi r3, r3, 1" but not "addi 3, 3, 1". Our assembler doesn't know the names of cr bits, so it takes numbers. It takes "bc 12, 1, label" but not "bc 12, eq, label". The 4 bits are named "lt", "eq", "gt" "so". Also, "un" is the same bit as "so". These bits are in field cr0. To reach fields cr0 to cr7, IBM and Freescale use an expression syntax like "cr6 * 4 + eq". This is awkward because our assembler doesn't allow register names in numeric expressions. It seems that I can add rules for the forms cr6 * 4 + eq eq + cr6 * 4 eq + 4 * cr6 but not 4 * cr6 + eq The last form causes a conflict in yacc. The assembler must continue to accept numbers. If it also accepts names, then a 4 * ... can either be a number or a name. The parser can use only two tokens to make the decision, so a 4 * ... would be ambiguous. The assembler has some problems with symbols. I tried to get the length of a string with .sect .text addi r5, r0, len_ok .sect .rom str_ok: .ascii "ok " len_ok = . - str_ok but this causes an error "must be absolute". Our assembler doesn't take my len_ok symbol in an addi instruction. Our linker has a weakness. Suppose that I want to load a 4-byte word from address 0x1234abcd. I would normally do lis r3, 0x1235 lwz r3, 0xabcd(r3) I split the address into two 16-bit values in a pair of instructions. The first sets r3 = 0x12350000 and the second loads the word at offset 0xabcd from there. I adjusted 0x1234 to 0x1235 because 0xabcd becomes negative. But if 0x1234abcd is in a symbol, then our linker requires me to do lis r3, 0x1234 ori r3, r3, 0xabcd lwz r3, 0(r3) because the linker can only relocate the symbol in an lis/ori pair. So I must have an extra ori instruction for the linker. Our assembler syntax for the lis/ori pair is now "li32 label". This is not documented. Perhaps the linker should learn about lis/lwz pairs, and lis/stw, lis/lbz, lis/lfd, and so on, but there are too many possible pairs, and each pair would need a new name in the assembler. --George Koehler ```
 Re: [Tack-devel] (no subject) From: George Koehler - 2016-12-07 04:35:20 ```On Sun, Dec 4, 2016 at 7:07 PM, Daniel V wrote: > I'm new here. The reason for checking this out, is to get some more > languages working for a VM that I've made. And I wonder if this can be > suitable for this. ACK is old. We have a C89 compiler with a libc. It works in part. I can't call ctime() because I get a linker error; the code in libc to get the time zone is obsolete. I can only compile a few simple programs (like the ones in ack/examples). If you choose ACK and not some other compiler, you get an old compiler. For the ACK to target your VM, you would need to 1. adapt the assembler (as) 2. figure out the converter (cv) 3. adapt the new code generator (ncg) 4. write a libsys Our C compiler translates C code into EM instructions. The ncg translates EM code into assembly (.s) text files. So first, you would need an assembler. If you adapt our assembler, it emits .o files in ack.out format. We have an ack.out linker, but you need a converter to go from the final ack.out file to a format that your VM runs. If you already have your own assembler and linker, you might be able to use them instead of adapting ours. Our EM is a stack language. For example, the EM instruction "adi 4" adds a pair of 4-byte integers. It pops 2 integers from the stack, and pushes the result. Our ncg needs the target machine's stack to look like the EM stack. The stack needs to grow down in memory. If your VM's stack grows up, or if your VM can't make pointers to the stack, then ACK might be bad choice. Suppose that the target machine has a 6502-ish "adc address" instruction. The pattern for "adi 4" in the ncg table might look like pat adi \$1==4 with ACCUM STACK gen sta {address, ".temp"} pla adc {address, ".temp"} yields %1 This is a pattern for adi 4, with the top integer in a register %1 of class ACCUM, and the second integer on the machine stack. We store the first integer in label ".temp". We pop the second integer from the machine stack to the accumulator. We do the add. We yield the sum in register %1. This isn't so efficient. The EM code "loe y" "loe x" "adi 4" might become "lda _y" "pha" "lda _x" "sta .temp" "pla" "adc .temp" and not the simpler "lda _y" "adc _x". One can write more rules to emit the simpler code, but the less efficient rule is enough to start compiling programs that use "adi 4". Our documentation is outdated. http://tack.sourceforge.net/olddocs.html has some old papers about "em" and "ncg" and such. Our best backend might be linux386 (files in plat/linux386 and mach/i386). You can get examples of EM by writing code in C and using "ack -mlinuxppc -c.e file.c" to translate it to EM. The currently working targets in mach/ are i386, i80, i86, m68020. You might find hints in mach/*/as and mach/*/*ncg. Beware of mach/powerpc and mach/vc4; these are new and contain some mistakes. EM has its own interpreter or virtual machine in util/int, but it doesn't work now. --George Koehler ```
 [Tack-devel] (no subject) From: Daniel V - 2016-12-05 00:07:07 ```Hi! I'm new here. The reason for checking this out, is to get some more languages working for a VM that I've made. And I wonder if this can be suitable for this. That VM has only one 32 bit register and a stack. It has a op-codes with a fixed-length 32bit operand, and a optional 32bit operator. And it has instructions a bit like a 6502 processor. I'm not that concerned with speed or efficiency. I only need a easy way to write C programs to that VM. Is ACK a good match for this, or should I look elsewhere for a easier way to gain a way to compile C code to that VM? And is there any documentation on how to make a new backend? or what backend would be a good start to change for this? // Daniel V. ```
 Re: [Tack-devel] new PowerPC tests: ack-ptest From: David Given - 2016-11-13 13:26:24 Attachments: signature.asc ```On 12/11/16 22:17, David Given wrote: [...] > I haven't figured out how to make qemu shut down yet after the program > completes --- neither the OpenFirmware 'poweroff' command nor the 'boot' > client service appear to do anything --- which means it's not suitable > for testing yet. Any ideas? Sorted --- I parse the test output and kill qemu when I see that the test has finished (with a timeout). It'd be nicer if we could terminate qemu from inside the test itself, but this works fine. There's now a very hacked together test framework in plat/qemuppc/tests in the dtrg-experimental-qemuppc branch. Tests are run on every build which modifies the qemuppc plat, but skipped if you don't have qemu-system-ppc installed. And they run on Travis! The test framework needs to be abstracted but it'll do fine for now. -- ┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ───── │ "There is nothing in the world so dangerous --- and I mean *nothing* │ --- as a children's story that happens to be true." --- Master Li Kao, │ _The Bridge of Birds_ ```
36 messages has been excluded from this view by a project administrator.

Showing results of 457

1 2 3 .. 19 > >> (Page 1 of 19)