tack-devel Mailing List for The Amsterdam Compiler Kit (obsolete) (Page 20)
Moved to https://github.com/davidgiven/ack
Brought to you by:
dtrg
You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(4) |
Jul
(4) |
Aug
(6) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(10) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(88) |
Aug
(15) |
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(1) |
2007 |
Jan
|
Feb
(8) |
Mar
(4) |
Apr
|
May
(32) |
Jun
(7) |
Jul
|
Aug
(2) |
Sep
(2) |
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
(3) |
Apr
(2) |
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(5) |
Oct
|
Nov
|
Dec
(2) |
2009 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(3) |
May
(1) |
Jun
(5) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(9) |
Dec
(2) |
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(12) |
Sep
(13) |
Oct
(2) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(2) |
May
(11) |
Jun
(7) |
Jul
(2) |
Aug
(3) |
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
(9) |
Mar
(7) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(8) |
Oct
(2) |
Nov
|
Dec
(2) |
2013 |
Jan
|
Feb
|
Mar
(7) |
Apr
(8) |
May
(23) |
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
|
Sep
(13) |
Oct
(1) |
Nov
(3) |
Dec
(1) |
2015 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(11) |
Jun
(7) |
Jul
(2) |
Aug
|
Sep
(6) |
Oct
(21) |
Nov
(19) |
Dec
(3) |
2017 |
Jan
(15) |
Feb
(3) |
Mar
|
Apr
(3) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2018 |
Jan
|
Feb
|
Mar
(6) |
Apr
|
May
(1) |
Jun
(12) |
Jul
|
Aug
|
Sep
(10) |
Oct
(4) |
Nov
(1) |
Dec
|
2019 |
Jan
(2) |
Feb
(19) |
Mar
(36) |
Apr
(4) |
May
(8) |
Jun
(11) |
Jul
|
Aug
|
Sep
(3) |
Oct
(3) |
Nov
(4) |
Dec
(1) |
2020 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(2) |
2021 |
Jan
|
Feb
(1) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
(9) |
Mar
|
Apr
(1) |
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2024 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
|
2025 |
Jan
(7) |
Feb
|
Mar
|
Apr
(10) |
May
(1) |
Jun
(2) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: tim k. <gt...@di...> - 2007-05-25 12:09:25
|
At 12:24 PM -0400 5/24/07, David Given wrote: > >For parameters-in-registers: (...) >...I get about 2.3s, or 23ns per iteration of 15 instructions, or 1.5ns per >instruction. > >For parameters-in-memory: (...) >...I get 4.3s, or 43ns per iteration of 22 instructions, or 2.0ns per >instruction. I looked at this again and I think I understated the extent of the differences. Under previous assumptions of lwz executing in a single cycle, the additional 7 instructions should have caused the algorithm to take only 50% longer, or roughly 3.5 seconds. Instead, it took 87% longer (4.3 seconds). The extra time is almost certainly due to memory acessing (there are five lwz and five stw instructions per loop). Additionally, the function itself had five instructions (including blr) when done in registers, and eleven instructions when done in memory. Over the 100,000,000 loop, there are 700,000,000 additional instructions. That is power consumed and time used. I started looking at where the differences in the number of instructions was coming from. In the memory-based example, all the values in struct s are updated each loop, even though only the first element (j) changes. There is a lot more overhead to the memory-centric example, too, which in the example doesn't cascade like it will in an executable. The m-c example uses eight local registers in main, while r-c uses three. Save and restore in the prolog/epilog cycle is 2.5 times larger. I can easily see m-c taking twice as long as r-c, when compared in real-world operations. The combination of extra time to access memory as well as the additional instructions necessary to do memory-centric model on PowerPC would inevitably lead to very bad performance on PowerPC, to the point that I think it would lack credibility. Although you (David) are reluctant to tinker with EM, I am coming to the conclusion that it needs revision. How modularized are the front-ends from the EM intermediate layer? tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: tim k. <gt...@di...> - 2007-05-24 17:34:43
|
At 1:16 PM -0400 5/24/07, tim kelly wrote: >1.5ns vs. 2.0ns per instruction is a 25% difference in performance. That's >huge. Add actually, to be more precise, it is 25% slower to do the operation in memory and 33% faster to do it with registers (someone else running four miles in the time it takes me to run three has whupped me pretty good). That sort of performance difference is what I am saying is at the root of statements about poor performance on PowerPC vs. x86. I had a chance once to ask two IBM XL C compiler engineers about the effects of virtualization of CPUs to the point that programmers never need to know what CPU they are programming for, and if the virtualization was absolutely perfect, what was the point in having more than one CPU design on the market? They didn't/couldn't/wouldn't answer the question, which probably speaks volumes about the future of CPUs. (I wouldn't run the same gears on a Wankel engine as an Otto engine, as doing so will kill performance on a Wankel engine. Drivers might not care, but engineers should.) tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: tim k. <gt...@di...> - 2007-05-24 17:16:17
|
At 12:24 PM -0400 5/24/07, David Given wrote: >Passing in memory is slower than using registers, but not painfully so; this >kind of performance strikes me as being entirely reasonable and probably not >worth spending much effort on optimising unless there's an actual need. ACK's >output won't be as good as this, because it's not designed to do the sort of >optimisations that gcc is, but I would be very surprised if it was >substantially different. 1.5ns vs. 2.0ns per instruction is a 25% difference in performance. That's huge. There is an actual need. We're trying to use ACK for an end-to-end BSD-licensed solution, including the operating system itself, which is being designed as an exokernel with applications pretty much running on bare metal (with a libraryOS for BSD compatibility). Gutting performance by 25% because of an enforced memory-centric programming model is antithetical to our goals (one of which is maximum performance). I'm not sure what to say at this point. We're not thrilled with gcc and the many conflicts of interest among the developers, some who work for companies that hold patents on compiler technologies necessary for optimal performance, or the license (GPL). We'd like to move away from it, but ACK seems willing to compromise overall performance for portability. I've seen this before, with NetBSD. Sure, it can be ported to almost any platform, but the compromises are absolutely horrible (and they only beat OpenBSD in performance because of OpenBSD's extent manager remapping a second time of memory). NetBSD also applies an x86-centric model, at the expense of performance on other platforms. The examples you are presenting as required by EM are giving a good case for showing optimizing for a register-centric model is not possible, or is only possible with Herculean effort. I guess I'm back to thinking about what tools we need :-( By the way, Prime-mover rocks! I'm pretty sure that will be our make replacement :-) tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: David G. <dg...@co...> - 2007-05-24 16:24:55
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 tim kelly wrote: [...] > This is much like what I was suggesting in treating the registers. > Positive registers are passed parameters, negative registers are local > variables, we have two register files of infinite numbers in positive a= nd > negative directions and only have to actually specify the physical > registers at the very last instant. Unfortunately not. This: lol 4 ; load local @4 loc 1 ; local constant 1 add EM_WSIZE ; add =2E..is *precisely* equal to this: *sp-- =3D fp[4]; *sp-- =3D 1; *sp-- =3D *++sp + *++sp; The code generator may, if it chooses, cache locals or the working stack = in registers for performance reasons, but it's got to make sure that it all = gets flushed out to memory when necessary. The working stack must get flushed = when a subroutine call occurs, because the working stack is going to form part= of the callee's stack frame; locals need to get flushed if something is goin= g to refer to them via memory. (I'm actually a little worried as to what happens if someone does this: lal 0 ; load address of local @0 loc 1 ; load constant 1 stl 0 ; store into local @0 loi EM_WSIZE ; dereference If local @0 is cached in a register, then the value in memory may not mat= ch the value in the register when the loi comes along. I have a bit of a suspicion that it doesn't work, very well; none of this has come up with = the Z80 code generator, of course, because it doesn't use regvars.) [...] > That's the trap I've been warning about - using x86 models for PowerPC.= > lwz won't run in a single tick, at all. It can take many cycles and ev= en > stall. IBM won't publish publically the number of cycles each instruct= ion > takes, but load and store operations are terrible. add is probably fou= r > cycles, lwz seven or more. I persuaded the testing department at work to crank up the old iMac and d= id some simple benchmarks. For parameters-in-registers: int testfunc(int x1, int x2, int x3, int x4, int x5) { return x1+x2+x3+x4+x5; } =2E.. for (i=3D0; i<100000000; i++) j +=3D testfunc(j, 2, 3, 4, 5); =2E..I get about 2.3s, or 23ns per iteration of 15 instructions, or 1.5ns= per instruction. For parameters-in-memory: int testfunc(struct s* s) { return s->x1+s->x2+s->x3+s->x4+s->x5; } =2E.. for (i=3D0; i<100000000; i++) { struct s s =3D {j, 2, 3, 4, 5}; j +=3D testfunc(&s); } =2E..I get 4.3s, or 43ns per iteration of 22 instructions, or 2.0ns per i= nstruction. The machine is an elderly iMac with a 740/750 "Arthur" processor running = at 400MHz, which means a 2.5ns cycle time (which means it's getting, on aver= age, more than one instruction of work done per cycle). It's got 32kB of L1 da= ta cache. The code is compiled with gcc 3.3.6 and is decently optimised but nothing fancy seems to be happening; it's exactly as we hypothesised earl= ier. (I've enclosed the actual source and assembly.) Passing in memory is slower than using registers, but not painfully so; t= his kind of performance strikes me as being entirely reasonable and probably = not worth spending much effort on optimising unless there's an actual need. A= CK's output won't be as good as this, because it's not designed to do the sort= of optimisations that gcc is, but I would be very surprised if it was substantially different. - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFGVbxFf9E0noFvlzgRAnPRAJjxil9XDfpe+dA0E7ojchcc/EZtAKCX6VTl PUJo7gIkBxkr6bBOh1/Mlw=3D=3D =3Di6F6 -----END PGP SIGNATURE----- |
From: tim k. <gt...@di...> - 2007-05-23 21:14:21
|
At 2:56 PM -0400 5/23/07, David Given wrote: >So the register allocation *algorithm* is in mach/proto/ncg, and is the same >for all architectures. It just needs to know what registers it can allocate, >which information is described in the table. OK, I'll examine this thoroughly. I have been reading the ngc doc, but I can't shake the feeling it expects me to know something beyond what is explained in the doc itself. >However, as part of the mapping from the stack machine to a register machine, >the code generator is at liberty to defer that write until such point as it >has to sync the stack into memory. Most of the time this means that it won't >get written at all, because the value can normally be used and consumed before >a sync point happens. Right, so can't we treat EM as a virtual machine until it is time to actually spell out the machine-dependent opcodes? I'm looking at ngc as basically a substitution mechanism, using the tables to translate EM to machine-dependent assembly. Perhaps I am mistaken here? This is much like what I was suggesting in treating the registers. Positive registers are passed parameters, negative registers are local variables, we have two register files of infinite numbers in positive and negative directions and only have to actually specify the physical registers at the very last instant. >Subroutine calls are sync points. Which is great, because I've been looking at solving issues from a single function call and scale that to the hundreds within an application. I do think if we solve this once it can be applied homogenously. >The fact that this is in memory is important to the way EM works. It's *legal* >to access local 3 by taking the address of local 2, adding 4, and >dereferencing --- that's how varargs in C works. In PowerPC, IIRC, the arguments are placed on the stack and put into registers through inlined code. It is really messy and very slow. >[...] >> One aspect that comes to mind is that under the stack model as described it >> appears functions can access parameters not local to themselves, simply by >> reading further down the stack. That would violate local scope rules. Am >> I misunderstanding this? > >Yes, this is entirely correct. There's an EM opcode 'lxl', which gives you >access to your caller's stack. 'lxa 0' returns you your frame pointer; 'lxl 1' >returns you your caller's; 'lxl 2' *its* caller, and so on. > >I don't think this is used with an argument greater than 0 anywhere in the >existing libraries, but it does exist. Can a backend choose to not implement an EM opcode? >However, I don't think it's as bad as you make out --- remember, this is all >on the hot tip of the stack, which is going to be in Level 1 cache all the >time; this is all the same technology that makes the very stack-centric Intel >chips run fast. The lwz itself will most likely run in a single tick, and the >data will most likely be available in the following tick, which as it's not >being used You're probably on looking at 2 or 3 ticks for the lwz, at the very >most, as opposed to the 1 tick for an add. That's the trap I've been warning about - using x86 models for PowerPC. lwz won't run in a single tick, at all. It can take many cycles and even stall. IBM won't publish publically the number of cycles each instruction takes, but load and store operations are terrible. add is probably four cycles, lwz seven or more. Compiler writers that have signed the NDA and gotten access to Book E/Book IV have access to the cycle times and their optimized code will show gaps of four instructions or more for various operations, where a register won't be referenced again for several instructions. For the longest time I couldn't figure out why the optimized code looked absolutely nothing like the source code, until I found out how badly PowerPC needs to be pipelined in order to be fast with memory operations. The key to speed on PowerPC is all registers, all the time (it is even better to bloat code with inlines rather than write to memory, up to a point). Wicked fast, but not really understood by the masses used to x86 models. It really comes out when you look at the Cell and POWER6 designs, where they are running parallel execution units and one unit will be memory accesses. For example, on Cell's SPUs, you'll schedule a memory access _16_ instructions before you need it, or else you stall. Admittedly, much of this is due to the boundary contraints of Cell's SPUs, but it really hints at how critical it is to avoid memory accesses unless absolutely necessary and then schedule them well in advance. >I strongly suspect that the ACK's output will be good enough for most >purposes. At the very least, the first law of profiling applies ('you don't >know where it's slow until you've measured it'). We know PowerPC is slow in accessing memory :-) tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: tim k. <gt...@di...> - 2007-05-23 20:54:45
|
(I'm separating a section to isolate it.) At 2:56 PM -0400 5/23/07, David Given wrote: >> What would be the EM code for the x function, including the MES notes? > >It looks like this: > > mes 2,4,4 word = 4 bytes, ptr = 4 bytes > exp $x x is an exported symbol > pro $x,0 begin function x, local size 0 > mes 3,20,4,0,1 local at 20 is of size 4, type any, used once > mes 3,16,4,0,1 > mes 3,12,4,0,1 > mes 3,4,4,0,1 > mes 3,0,4,0,1 > mes 3 > mes 9,24 specifies the size of the parameter block (!) ... >Hey, look at that! There *is* a way of getting the number of parameters for a >subroutine. Sorry, I hadn't noticed that until now (I'm still learning about >this stuff too!) Yes, and I've found parts of the references I remembered that led me to believe it _is_ possible to extract the number of passed parameters, in section 11.1.4.2 of EM (and noted in your email). Local Base (LB) variables are denoted with negative numbers in MES 3 notes, Argument Base (AB) variables are noted by positive numbers in MES 3 notes. Additionally, in 4.2 of EM, the paragraph that begins "Third, the amount of local storage needed...Negative offsets are used for access to local variables." So above, the parameter block is 24 bytes, all passed as parameters, and 24/4=6, so six parameters. However, this line >lal 8 push address of local #8 (i3) is where things are really odd. The stack-centric model allows this, but the results are ambiguous at best for a register-centric model. You passed an integer as a parameter, but then referenced the address the integer resided at. In a register-centric model, the integer was passed in a register, and no local stack frame space is needed to hold it. I can say that while in theory it is great to get rid of stack frames and hold everything in volatile registers, it is almost impossible to stick with this in practice. If a function calls a function, the caller function _has_ to create a stack frame. Ergo, I think what gets settled is that _all_ function calls will create a stack frame and while the instruction lal will not fail, it will give ambiguous results. >Unfortunately, some experimentation reveals that it's not actually very >accurate --- it doesn't take into account C varargs, for example. I suspect >that even if it were accurate it still may not help; the caller doesn't know >how many parameters the callee is expecting. varargs is handled very, very poorly on PowerPC. I recognize its usefulness, but I'd also cheer if it went away... As for how many parameters the callee is expecting, I think that should fall back on the frontend to match function declarations. The backend can reasonably expect to recover values from registers the called function is expecting to use, whether or not the callee has initialized them properly. tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: David G. <dg...@co...> - 2007-05-23 18:56:43
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 tim kelly wrote: [...] > :-O...the platform-independent part determines what registers to use fo= r > locals? Pardon my ignorance on this, but all I've seen so far is that > registers are described in generic terms, but the specifics of the > registers are left to the backend. The code generator is made up of a big chunk of C code in mach/proto/ncg,= which is common for all architectures, a table file (e.g. mach/i80/ncg/ta= ble), which is processed through ncgg to produce anouther C file, and some platform-specific files (e.g. mach/i80/table/mach.c). All this lot gets compiled --- separately for each architecture --- and produces a single b= inary. So the register allocation *algorithm* is in mach/proto/ncg, and is the s= ame for all architectures. It just needs to know what registers it can alloca= te, which information is described in the table. [...] > I don't understand this at all, because my understanding is that EM isn= 't > executable except through an interpreter (therefore doesn't do any actu= al > "passing"). EM still needs to be taken to machine code before the > application can execute. The stack is a "virtual" stack during this > intermediate stage. Again, it could be my ignorance on the matter. EM is only not executable because it hasn't been implemented in silicon. = There is, I'm afraid, nothing abstract about it. EM actually *specifies* that t= he stack lives in memory. A 'loc' instruction (push constant) actually does = do a memory write. However, as part of the mapping from the stack machine to a register mach= ine, the code generator is at liberty to defer that write until such point as = it has to sync the stack into memory. Most of the time this means that it wo= n't get written at all, because the value can normally be used and consumed b= efore a sync point happens. Subroutine calls are sync points. The EM spec defines an explicit stack f= rame layout that must be in memory on entry to a subroutine, which looks like = this: =2E.. local 3 <- input parameters local 2 local 1 local 0 return block <- usually 2 words wide, containing LP and old FP local -1 <- frame pointer here local -2 local -3 <- function temporaries =2E.. The fact that this is in memory is important to the way EM works. It's *l= egal* to access local 3 by taking the address of local 2, adding 4, and dereferencing --- that's how varargs in C works. [...] > One aspect that comes to mind is that under the stack model as describe= d it > appears functions can access parameters not local to themselves, simply= by > reading further down the stack. That would violate local scope rules. = Am > I misunderstanding this? Yes, this is entirely correct. There's an EM opcode 'lxl', which gives yo= u access to your caller's stack. 'lxa 0' returns you your frame pointer; 'l= xl 1' returns you your caller's; 'lxl 2' *its* caller, and so on. I don't think this is used with an argument greater than 0 anywhere in th= e existing libraries, but it does exist. (See section 4.2 of the EM white paper; it calls the frame pointer 'LB' (= Local Base), and also refers to 'AB' (Argument Base), which is simply the addre= ss of local 0, the first argument. This is typically LB*sizeof(word)*2 on most = systems.) [...] > Except the above is really bad code (no insult intended, you are giving= a > concrete depiction of a typical output) and would be unbelievably slow = on > PowerPC. Even if you manage to get all of the stack on the same cache > line, you will almost certainly stall significantly at some point - lik= e > the next time the routine was called with the stack in a different > location. If the parameters/stack span a cache line, stalling will be > enormously painful to performance. Yup. It it's not great. However, I don't think it's as bad as you make out --- remember, this is = all on the hot tip of the stack, which is going to be in Level 1 cache all th= e time; this is all the same technology that makes the very stack-centric I= ntel chips run fast. The lwz itself will most likely run in a single tick, and= the data will most likely be available in the following tick, which as it's n= ot being used You're probably on looking at 2 or 3 ticks for the lwz, at the= very most, as opposed to the 1 tick for an add. I strongly suspect that the ACK's output will be good enough for most purposes. At the very least, the first law of profiling applies ('you don= 't know where it's slow until you've measured it'). > What would be the EM code for the x function, including the MES notes? It looks like this: mes 2,4,4 word =3D 4 bytes, ptr =3D 4 bytes exp $x x is an exported symbol pro $x,0 begin function x, local size 0 mes 3,20,4,0,1 local at 20 is of size 4, type any, used once mes 3,16,4,0,1 mes 3,12,4,0,1 mes 3,4,4,0,1 mes 3,0,4,0,1 mes 3 mes 9,24 specifies the size of the parameter block (!) lol 0 push local #0 (i1) lol 4 push local #4 (i2) adi 4 add word (the 4 is the size) lol 8 adi 4 lol 12 adi 4 lol 16 adi 4 lol 20 adi 4 lal 8 push address of local #8 (i3) adi 4 ret 4 return 4-byte value on top of stack end 0 end function Hey, look at that! There *is* a way of getting the number of parameters f= or a subroutine. Sorry, I hadn't noticed that until now (I'm still learning ab= out this stuff too!) Unfortunately, some experimentation reveals that it's not actually very accurate --- it doesn't take into account C varargs, for example. I suspe= ct that even if it were accurate it still may not help; the caller doesn't k= now how many parameters the callee is expecting. I shall investigate, but right now I have to go shopping. - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGVI5ff9E0noFvlzgRAozTAKCsGj2TAvhRcJjoR5/fa+1m5gpEBQCfV4uR KBDz/VxzeX9VzSEzt8TYWyo=3D =3DY/gC -----END PGP SIGNATURE----- |
From: tim k. <gt...@di...> - 2007-05-23 16:58:30
|
At 10:49 AM -0400 5/23/07, David Given wrote: >Well... not necessarily. The convention is that the caller is responsible for >both pushing parameters onto the stack and then popping them off again >afterwards, so the callee doesn't need to know about such things. All it gets >is, in effect, a pointer to the first parameter. Yeah, this is one noticeable difference between Forth and EM (if I'm understanding your statement above accurately). AFAIK, in Forth parameters pushed onto a stack are consumed by the callee, and the caller is responsible for popping the result back off. If you need the original parameters, the caller must do "Ndup" which duplicates N number of values on the stack. Movement of the stack pointer is transparent to the code, though. >[...] >> Therefore, >> the only missing link is the MES statement letting the backend know how >> many parameters are passed to the EM-based subroutine/function call. (The >> prolog and epilog then prepare the local variables in registers and save >> and restore non-volatile registers.) > >That is, in fact, exactly what happens when using the regvars extension. The >compilers generate hints to say 'Local #n is used X times, it'd be nice if you >could optimise that a bit'. The platform-independent part of the code >generator figures out which locals go in which registers, and then the >platform-dependent part prepares the registers by copying the values out of >the stack frame into the registers. (See MES 3.) :-O...the platform-independent part determines what registers to use for locals? Pardon my ignorance on this, but all I've seen so far is that registers are described in generic terms, but the specifics of the registers are left to the backend. If this isn't true, then I would argue this has spanned true platform-independence. Beyond a few bookkeeping registers (which arguably could reside in memory), EM shouldn't have any requirements on the final machine-dependent code. >(Of course, this will only work if the code isn't referring to the address of >those stack slots.) > >But this doesn't affect the calling convention; the parameters still get >*passed* in memory. I don't understand this at all, because my understanding is that EM isn't executable except through an interpreter (therefore doesn't do any actual "passing"). EM still needs to be taken to machine code before the application can execute. The stack is a "virtual" stack during this intermediate stage. Again, it could be my ignorance on the matter. Every function call after main() is a subroutine, so in theory this only has to be solved for one subroutine and then applied to all of the others. In the environment I am developing, main() doesn't get passed parameters (POSIX is not a concern, everything is done with message passing). Regardless, though, if registers are initialized before entering main() and that convention is kept, there shouldn't be any overlap. One aspect that comes to mind is that under the stack model as described it appears functions can access parameters not local to themselves, simply by reading further down the stack. That would violate local scope rules. Am I misunderstanding this? >You can see the EM bytecode if you compile with -c.e. The EM white paper >contains a reasonably complete description of what they all do... Yes; however, I've been trying to find that tie between EM and ncg. > >Typically, I'd expect ACK on a register-centric architecture like the PowerPC >to reserve, say, eight registers for expression evaluation, have a few for >housekeeping, and to use all the rest for local storage. So: > >int x(int i1, int i2, int i3, int i4, int i5, int i6) >{ > i1 = i1+i2+i3+i4+i5+i6 + (int)&i3; >} > >becomes (hand-compilation, omitting the prologue and epilogue boilerplate): > >; preload registers >lwz r8, 4(sp) ; r8 = local #0 = i1 >lwz r9, 8(sp) ; i2 >lwz r10, 16(sp) ; i4 >lwz r11, 20(sp) ; i5 >lwz r12, r4(sp) ; i6 >; perform calculation >add r1, r8, r9 ; x = i1 + i2 >lwz r2, 12(sp) ; load i3, not cached in register >add r1, r1, r2 >add r1, r1, r10 ; x += i4 >add r1, r1, r11 ; x += i5 >add r1, r1, r12 ; x += i6 >addi r2, sp, 12 ; get address of i3 >add r8, r1, r2 ; result goes directly into the i1 register > >The prologue and epilogue would need to save and reload r8-r12, of course. By >carefully tweaking how the registers are used you may be able to do this in >one instruction. r1-r7 are scratch and don't need saving. Except the above is really bad code (no insult intended, you are giving a concrete depiction of a typical output) and would be unbelievably slow on PowerPC. Even if you manage to get all of the stack on the same cache line, you will almost certainly stall significantly at some point - like the next time the routine was called with the stack in a different location. If the parameters/stack span a cache line, stalling will be enormously painful to performance. gcc with -O3 could quite likely produce something like (i3 is in r3 and i6 is in r8) stwu r5, 0(sp) add r8, r7, r8 ; i5+i6 to r8 add r3, r4, r3 ; i1+i2 to r3 add r3, r6, r3 ; i4+(i1+i2) to r3 add r3, r3, sp ; adding the address of i3, which was stored on the local stack add r3, r8, r3 ; (i5+i6) to everything else The result is already in r3, and memory was never accessed. Granted, most likely gcc would choke and shove some stuff into non-volatile registers, but sometimes the optimizations are pretty decent. (And of course, the results are going to be highly irregular and differ depending on optimization levels and stack location.) What would be the EM code for the x function, including the MES notes? >[...] >> Isn't EM basically a representation of logic? Although an interpreter can >> take EM opcodes and convert them on the fly, the representation of the >> programming logic isn't going to be affected during EM generation, and EM >> generation doesn't affect the final object code. Therefore, the backend is >> still responsible for the realities of the underlying architecture. EM >> might represent values being on a stack, but that's still just a "virtual" >> stack. > >Unfortunately, not always. EM specifies a particular format for the stack >frame, and there's a magic EM pseudo-register that points to it. Parameters >are then defined at particular offsets from this stack frame. There are EM >opcodes that will either read or write single or double-word values, or else >take the address of a particular frame slot --- there's no difference between >'lol 3' (load word local #3) or 'lal 3; loi 4' (load address of word local #3; >derefence word). What's more, there's no information about types, either; a >double-word local simply occupies two frame slots, and it's possible to read >or write the high and low words separately. > >(32 bit words, here. Also, EM uses 'local' to refer to function parameters and >function temporaries.) I still don't see where this implies or requires some adherence to accessing the parameters in memory. The MES notes state what size the local variables are, and where on the (virtual) stack they are. If a parameter is half a word, it requires opcodes that only fill half the register. This can be done by the caller before jumping. EM lays out a roadmap, but the backend does the actual translation to something appropriate to the machine. It does, in essence, posit an ABI that each function call will recognize, from top to bottom, and determined and implemented by the backend. The compromise/solution might resemble something like SPARC's register window (similar to what you described above), but I think enforcing a memory-centric model on a register-centric CPU is not really portable. I've already seen too much of forcing x86-centric models onto PowerPC and the resulting devastating effects on PowerPC performance to go down that road again. >The ARM code generator is probably the best one to look at, but it's a bit >cryptic (there's a lot of support for the ARM's odd addressing modes, which is >all entirely irrevelant for the simple PowerPC). The SPARC code generator >actually uses an entirely different and unhelpful code generator mechanism >that I haven't bothered to make work (because it makes lousy code). Ah. I was hoping for something that had the opcodes all ready to go so I could focus on the optimizations. Then I could take the optimization for register-centric CPUs and write the tables for the opcodes for PowerPC. That way I don't have to try both at the same time. >Anything in the mach directory with a ncg/table file is a new-style code >generator. > >...incidentally, you may want to investigate using qemu as a testbed; it >supports ARM, i386, MIPS, PowerPC, x86_64 and sparc and will allow 'hardware' >debugging of the emulated machine (clunkily, via gdb). Good point. Someone else I know had suggested that as well, some time ago, for a different project. tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: David G. <dg...@co...> - 2007-05-23 14:50:46
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 tim kelly wrote: [...] > Several points come to mind. First, there almost has to be a way for E= M to > know the number of parameters a subroutine/function call is expecting, = in > order to properly match the code being jumped to (I'm still getting up = to > speed on pattern matching and CAL calls). Well... not necessarily. The convention is that the caller is responsible= for both pushing parameters onto the stack and then popping them off again afterwards, so the callee doesn't need to know about such things. All it = gets is, in effect, a pointer to the first parameter. [...] > Therefore, > the only missing link is the MES statement letting the backend know how= > many parameters are passed to the EM-based subroutine/function call. (= The > prolog and epilog then prepare the local variables in registers and sav= e > and restore non-volatile registers.) That is, in fact, exactly what happens when using the regvars extension. = The compilers generate hints to say 'Local #n is used X times, it'd be nice i= f you could optimise that a bit'. The platform-independent part of the code generator figures out which locals go in which registers, and then the platform-dependent part prepares the registers by copying the values out = of the stack frame into the registers. (See MES 3.) (Of course, this will only work if the code isn't referring to the addres= s of those stack slots.) But this doesn't affect the calling convention; the parameters still get *passed* in memory. You can see the EM bytecode if you compile with -c.e. The EM white paper contains a reasonably complete description of what they all do... =2E.. Typically, I'd expect ACK on a register-centric architecture like the Pow= erPC to reserve, say, eight registers for expression evaluation, have a few fo= r housekeeping, and to use all the rest for local storage. So: int x(int i1, int i2, int i3, int i4, int i5, int i6) { i1 =3D i1+i2+i3+i4+i5+i6 + (int)&i3; } becomes (hand-compilation, omitting the prologue and epilogue boilerplate= ): ; preload registers lwz r8, 4(sp) ; r8 =3D local #0 =3D i1 lwz r9, 8(sp) ; i2 lwz r10, 16(sp) ; i4 lwz r11, 20(sp) ; i5 lwz r12, r4(sp) ; i6 ; perform calculation add r1, r8, r9 ; x =3D i1 + i2 lwz r2, 12(sp) ; load i3, not cached in register add r1, r1, r2 add r1, r1, r10 ; x +=3D i4 add r1, r1, r11 ; x +=3D i5 add r1, r1, r12 ; x +=3D i6 addi r2, sp, 12 ; get address of i3 add r8, r1, r2 ; result goes directly into the i1 register The prologue and epilogue would need to save and reload r8-r12, of course= =2E By carefully tweaking how the registers are used you may be able to do this = in one instruction. r1-r7 are scratch and don't need saving. [...] > Isn't EM basically a representation of logic? Although an interpreter = can > take EM opcodes and convert them on the fly, the representation of the > programming logic isn't going to be affected during EM generation, and = EM > generation doesn't affect the final object code. Therefore, the backen= d is > still responsible for the realities of the underlying architecture. EM= > might represent values being on a stack, but that's still just a "virtu= al" > stack. Unfortunately, not always. EM specifies a particular format for the stack= frame, and there's a magic EM pseudo-register that points to it. Paramete= rs are then defined at particular offsets from this stack frame. There are E= M opcodes that will either read or write single or double-word values, or e= lse take the address of a particular frame slot --- there's no difference bet= ween 'lol 3' (load word local #3) or 'lal 3; loi 4' (load address of word loca= l #3; derefence word). What's more, there's no information about types, either;= a double-word local simply occupies two frame slots, and it's possible to r= ead or write the high and low words separately. (32 bit words, here. Also, EM uses 'local' to refer to function parameter= s and function temporaries.) [...] > I suspect my next step is going to be to understand the ARM and SPARC (= RISC > based with different approaches to registers) backend tables from 5.6, = and > perhaps attempt to bring this into 6.0. Of course, I don't have a test= bed > for either architecture, and I'll see about getting ACK 6.0 to compile = on > OS X (with OpenBSD as a fall back). The ARM code generator is probably the best one to look at, but it's a bi= t cryptic (there's a lot of support for the ARM's odd addressing modes, whi= ch is all entirely irrevelant for the simple PowerPC). The SPARC code generator= actually uses an entirely different and unhelpful code generator mechanis= m that I haven't bothered to make work (because it makes lousy code). Anything in the mach directory with a ncg/table file is a new-style code generator. =2E..incidentally, you may want to investigate using qemu as a testbed; i= t supports ARM, i386, MIPS, PowerPC, x86_64 and sparc and will allow 'hardw= are' debugging of the emulated machine (clunkily, via gdb). - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGVFRcf9E0noFvlzgRAnFTAJ9kN0CUKxilWSVyGseHm3jBxWmf7QCgu9Ti f9YBq5Rm2LxMagr60CXm9F8=3D =3DR1tM -----END PGP SIGNATURE----- |
From: David G. <dg...@co...> - 2007-05-23 13:50:26
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 texts writer wrote: > I want to thank you for continuing this product and keeping it open > I think that tack is the only one c compiler for linux which produces > glibc independed binaries Thanks! > I am reading documentation and trying to understand how tack works. > I am good in 6502 programming and know well one particular platform, > so in the nearest future I may have few questions and we may have > another target There's actually a 6502 code generator deep within the CVS source (target= ting the BBC Micro, though it looks fairly portable). Naturally, it produces = lousy code simply because you *can't* compile C for the 6502 with any degree of= success. Possibly changing it to generate Sweet16 code might help --- slo= wer, but much smaller. If you're interested in poking around the innards, you may want to get a = CVS snapshot. What's in the release is simply a vastly-sanitised subset of wh= at's really there --- the full ACK codebase is huge, and contains at least thr= ee (possibly four) complete compiler frameworks. The documentation is also terribly ancient and not terribly complete... - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGVEadf9E0noFvlzgRAl0yAJ9sIFHk0y4MuY3fvNYQinqXixEkZACgqolO zCrlMk9w1ovDc2xgert1hyQ=3D =3Dz+in -----END PGP SIGNATURE----- |
From: David G. <dg...@co...> - 2007-05-23 13:40:42
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 texts writer wrote: > is it possible? > I noticed that object file format differs. At least "file" utility > doesn't recognize object files produced by tack and archive files as > well Unfortunately GNU binutils can't read ACK object files, and ACK can't wri= te GNU ELF object files. One day I want to do a proper ack.out-to-ELF converter, but it's way down= on the list of priorities right now. - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGVERPf9E0noFvlzgRAv+EAJ9rW83Iqa6nTVmnzc7E+8ee6a16PACffJjF HTR3EoFQ+g2hyWPcBrkT36k=3D =3DW/uF -----END PGP SIGNATURE----- |
From: David G. <dg...@co...> - 2007-05-23 13:34:59
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 texts writer wrote: > Do you aware of any function in tack c library similar to fstat ? The C library supplied is limited to ANSI, pretty much, because implement= ing and supporting all the Posix syscalls is an awful lot of work and I was wanting to focus on getting other things done. If you want to roll your own on Linux, then this *may* work: extern int _syscall(int op, int p1, int p2, int p3); int fstat(int fd, struct stat* buf) { int i =3D _syscall(108, fd, (int)buf, 0); if (i < 0) { errno =3D i; return -1; } return 0; } (Totally untested!) - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGVEL9f9E0noFvlzgRAr2lAJ49kiEH5EiMytqnVA8BTHKsnSHnjgCdG4UQ OVCwMYOLd2mmENf2Y050Whg=3D =3D18kP -----END PGP SIGNATURE----- |
From: David G. <dg...@co...> - 2007-05-23 13:29:35
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 texts writer wrote: > Do you aware that when running "file" on resulting binary it reports: > ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), > statically linked, corrupted section header size This is because aelflod generates ELF executables with a program header t= able but no section header table. I don't know why file reports the executable= s as corrupt, but objdump is happy with them (and they work). - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGVEG4f9E0noFvlzgRAraIAJ9aQV4+M558Nw1QLW6O7CpPAXrlPwCglYvB /ZuGwByPiHYWIBaTPnElhzA=3D =3Diglm -----END PGP SIGNATURE----- |
From: David G. <dg...@co...> - 2007-05-23 13:25:29
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 texts writer wrote: [...] > I am new to tack, so excuse me for that question. > I didn't find a way to link together generated .o files > It seems ack doesn't accept syntax like ack -o out 1.o 2.o 3.o 4.o 5.o = 6.o 7.o > On the other hand I cannot find linker. If it is only em_led then its > output couldn't be given to aelflod You're right, that doesn't work. That's bizarre. I'll work on getting tha= t fixed; ta. As a workaround, it seems to get it right provided there's at least one .= c file on the command line. (There doesn't have to be anything in it.) So: ack -o out empty.c 1.o 2.o 3.o 4.o... =2E..ought to work. - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 =E2=94=82 Ugl=C3=BAk u bagronk sha pushdug Internet-glob bbhosh skai. =E2=94=82 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGVEC6f9E0noFvlzgRAku7AKCPJ4O42+iXw0nuLJAwRxSZugXehwCgzxBN /ue/jbv4dYpFLUCVpc/aawc=3D =3D7V79 -----END PGP SIGNATURE----- |
From: texts w. <tex...@go...> - 2007-05-23 12:42:25
|
is it possible? I noticed that object file format differs. At least "file" utility doesn't recognize object files produced by tack and archive files as well Cheers Norayr |
From: texts w. <tex...@go...> - 2007-05-23 12:29:27
|
I want to thank you for continuing this product and keeping it open I think that tack is the only one c compiler for linux which produces glibc independed binaries I am reading documentation and trying to understand how tack works. I am good in 6502 programming and know well one particular platform, so in the nearest future I may have few questions and we may have another target Cheers Norayr |
From: texts w. <tex...@go...> - 2007-05-23 12:24:28
|
Do you aware of any function in tack c library similar to fstat ? |
From: texts w. <tex...@go...> - 2007-05-23 12:23:09
|
Do you aware that when running "file" on resulting binary it reports: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, corrupted section header size Thanks |
From: texts w. <tex...@go...> - 2007-05-23 12:16:03
|
Hello I am new to tack, so excuse me for that question. I didn't find a way to link together generated .o files It seems ack doesn't accept syntax like ack -o out 1.o 2.o 3.o 4.o 5.o 6.o 7.o On the other hand I cannot find linker. If it is only em_led then its output couldn't be given to aelflod Thanks Norayr |
From: tim k. <gt...@di...> - 2007-05-23 11:47:02
|
At 5:53 PM -0400 5/22/07, David Given wrote: >[...] >> > It seems to me that much of the stack to register problem just means a >> > highly optimized prolog and epilog, but isn't really a huge obstacle. > >I'm not sure it's that easy. I don't believe EM routines know how many >parameters they've got --- parameters are simply accessed by number, -1 being >the first parameter, -2 being the second, etc. You're perfectly at liberty to >keep indexing parameters until you run out of address space; that's how >varargs functions work. This means there's no way to know which parameters are >in registers and which ones are in memory when the function is called. Several points come to mind. First, there almost has to be a way for EM to know the number of parameters a subroutine/function call is expecting, in order to properly match the code being jumped to (I'm still getting up to speed on pattern matching and CAL calls). Also, although perhaps not currently in EM, there should be a mechanism tying the function declarations through to the backend from the source code, for debugging and other reasons. I can not find the list of parameters MES 10 is expecting, but this certainly seems like a logical place to add this feature. The process of converting to EM should not affect the logic of the code that was coverted, so a function that sums two numbers should expect two parameters to be passed to it. From there it I would (perhaps irrationally) think that as long as the stack loading convention is consistent, the backends should know how to retrieve the items from the stack and there won't be any non-homogeneous conditions (it will be uniform all or nothing from the entry point into the executable code). Therefore, the only missing link is the MES statement letting the backend know how many parameters are passed to the EM-based subroutine/function call. (The prolog and epilog then prepare the local variables in registers and save and restore non-volatile registers.) >Basically, EM wants the canonical storage for a parameter or a temporary to be >*memory*, not a register. I don't think this can be changed without >substantially changing the way EM works, and I really don't want to do that >- --- I'm still struggling to understand the bits I'm working on as it is. Isn't EM basically a representation of logic? Although an interpreter can take EM opcodes and convert them on the fly, the representation of the programming logic isn't going to be affected during EM generation, and EM generation doesn't affect the final object code. Therefore, the backend is still responsible for the realities of the underlying architecture. EM might represent values being on a stack, but that's still just a "virtual" stack. It certainly makes backend tables for stack-based CPUs much easier to write, but nothing I've seen in the explanations of the logic behind ACK suggest an exclusion of register-based CPUs. Quite the opposite, I regularly see references to register-based CPUs. Certainly if I have misunderstood something I expect (and prefer) to be corrected :-) I suspect my next step is going to be to understand the ARM and SPARC (RISC based with different approaches to registers) backend tables from 5.6, and perhaps attempt to bring this into 6.0. Of course, I don't have a testbed for either architecture, and I'll see about getting ACK 6.0 to compile on OS X (with OpenBSD as a fall back). thanks, tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: David G. <dg...@co...> - 2007-05-22 21:53:19
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 tim kelly wrote: [...] > > I also suspect this can be handled in the object file format. My tho= ughts > > have been to use Motorola's Preferred Executable Format (PEF), which = is > > publically available. I will have to look into patents to make sure = there > > won't be a licensing issue. The same approach to string handling may= be > > present in other formats. Actually dealing with that is easy --- li compiles to 'addi <target>, r0,= <low16>; addis <target>, <target>, <high16>'. Just replace the r0 with RT= OC and it'll work fine. [...] > > Are there any rules regarding how many arguments on the stack a call = can > > return? Can a function call return ten or more values on the stack? Subroutine calls don't use the stack, actually (because the stack frame g= ets in the way and makes things complicated). There's a special area known as= the 'function return area' which is used for this. The EM 'ret' opcode pops a= value off the stack and saves it into the FRA; then there's another EM op= code ('lfr') that fetches the value out of the FRA. Most platforms implement t= he FRA in registers; it can be 8 bytes long at most. For example, my Z80 cod= e generator uses DE as the FRA for 2-byte returns, and an external memory location for 8-byte returns. (The 'lfr' instruction is defined to only be= valid immediately after a subroutine call). [...] > > It seems to me that much of the stack to register problem just means = a > > highly optimized prolog and epilog, but isn't really a huge obstacle.= I'm not sure it's that easy. I don't believe EM routines know how many parameters they've got --- parameters are simply accessed by number, -1 b= eing the first parameter, -2 being the second, etc. You're perfectly at libert= y to keep indexing parameters until you run out of address space; that's how varargs functions work. This means there's no way to know which parameter= s are in registers and which ones are in memory when the function is called. Basically, EM wants the canonical storage for a parameter or a temporary = to be *memory*, not a register. I don't think this can be changed without substantially changing the way EM works, and I really don't want to do th= at - --- I'm still struggling to understand the bits I'm working on as it is= =2E (Note that this only applies to the function call API. *Inside* functions= , values are cached in registers when possible.) [...] > > Any thought to moving to a BSD platform for development? :-) When Canonical start producing a BSD-based version of Ubuntu, I'll switch= over like a shot... unfortunately, until then, Linux it is. However, I do have= an elderly laptop running OpenBSD 4.1 that's used as a test machine, and you= 'll be pleased to know that ACK 6.0pre3 builds cleanly on it... - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 "Parents let children ride bicycles on the street. But parents = do not =E2=94=82 allow children to hear vulgar words. Therefore we can deduce th= at cursing =E2=94=82 is more dangerous than being hit by a car." --- Scott Adams -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGU2Y9f9E0noFvlzgRAhGUAKCLHe2NXECQW7NbCpbKCoJ1RHjoSwCgi34k fbwnpnU0+639KtElp3rJryM=3D =3DT0MS -----END PGP SIGNATURE----- |
From: tim k. <gt...@di...> - 2007-05-22 21:19:30
|
Hi David, >I'm actually currently working on a brand new Z80 code generator, so I'm >getting to know a fair bit about the way code generators work in the ACK. Great! >So, in general, I can now confidently state that writing a PowerPC code >generator for the ACK would not, in fact, be particularly difficult. OK, I also should ask about an opcode generator. How hard will it be to write a parser? > In fact, >there are a number of features in the PowerPC instruction set that make things >simpler. However: > >- - there is absolutely no chance that the ACK can be made to conform to any >standard PowerPC ABI. That's not a requirement at my end, but I do need to preserve the concept of volatile and non-volatile registers. Memory accesses on PowerPC are particularly slow, and there are way more registers on PPC than on most CPUs, so there still has to be a register-centric programming model. >- - the code won't be terribly fast (but won't be as bad as I initially >feared, >either). This might be solved by doing optimizations in multiple passes, or perhaps an ability to "de-optimize" EM in order to take advantage of the PPC approaches. >Basically, the ACK wants to pass all parameters to functions on the stack, >where the standard ABI wants them in registers. Right. Seems like I should spend a significant amount of time examining this problem. I believe the solution comes from an approach I learned when interviewing two of IBM's XL C compiler software engineers (for IBM's developerWorks). Instead of worrying about how many registers an architecture has from the start, they make the final register allocation after all of the algorithm has been generated. This would come under the philosophy of "infinite registers," so I suspect the problem can be solved by delaying the final register selection until we know exactly how many we need. >So if you're willing to live with that, I don't think there's much of a >problem. What is the best way to present a solution, in code or in detailed documentation? >Well, good to hear from you again! Thanks. Apparently I can ignore my spots all I want, but they don't change. >I don't know if you've noticed, but we actually have some real releases now: > >http://tack.sourceforge.net > >Currently there's only a limited set of architectures and platforms but at >least there's enough to run and play with. Good stuff! Any thought to moving to a BSD platform for development? :-) tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: David G. <dg...@co...> - 2007-05-22 20:28:43
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 tim kelly wrote: [...] > > We are still wanting to compile with a BSD-licensed PowerPC compiler.= If I > > give you various pieces of assembly level PowerPC code, are you able = to > > write a parser for the intermediate stages of ACK? I find it interes= ting > > that POWER6 has moved to an in-order architecture. This may make som= e of > > the concerns about a stack-based compiler less of an issue. On the o= ther > > hand, PASemi's really cool chip is pushing out-of-order execution eve= n > > farther. I'm actually currently working on a brand new Z80 code generator, so I'm getting to know a fair bit about the way code generators work in the ACK.= So, in general, I can now confidently state that writing a PowerPC code generator for the ACK would not, in fact, be particularly difficult. In f= act, there are a number of features in the PowerPC instruction set that make t= hings simpler. However: - - there is absolutely no chance that the ACK can be made to conform to = any standard PowerPC ABI. - - the code won't be terribly fast (but won't be as bad as I initially f= eared, either). Basically, the ACK wants to pass all parameters to functions on the stack= , where the standard ABI wants them in registers. This means that calling functions involves hitting memory. This: write(0, "Hello, world!\n", 14); =2E..would compile into: li r1, 14 stwu r1, -4(sp) ; push 14 li r1, _string ; becomes two instructions, remember stwu r1, -4(sp) ; push string li r1, 0 stwu r1, -4(sp) ; push 0 bl _write ; do the call addi sp, sp, 12 ; retract stack over pushed parameters (It *may* be possible to persuade the ACK to combine the three pushes int= o a single stswi instruction, but I can't guarantee it.) (While stwu is usefu= l as a push instruction, unfortunately lwzu can't be used as a pop, because it= does the memory dereference and the add in the wrong order. Luckily the ACK us= es pushes more than pops. Go figure.) So if you're willing to live with that, I don't think there's much of a p= roblem. > > In any event, I wanted to get back in touch and open up a dialog agai= n > > about writing a PowerPC compiler layer for ACK. Well, good to hear from you again! I don't know if you've noticed, but we actually have some real releases n= ow: http://tack.sourceforge.net Currently there's only a limited set of architectures and platforms but a= t least there's enough to run and play with. - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 "Parents let children ride bicycles on the street. But parents = do not =E2=94=82 allow children to hear vulgar words. Therefore we can deduce th= at cursing =E2=94=82 is more dangerous than being hit by a car." --- Scott Adams -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGU1Jwf9E0noFvlzgRAjMiAJ45VQJ4GjJPBuJ/+EubHIbqIo7o5QCgywga O1mjScAOV51CdscfKtObNQY=3D =3Diygk -----END PGP SIGNATURE----- |
From: tim k. <gt...@di...> - 2007-05-21 19:12:30
|
Hi David, I've been away for a while but it appears sometime in the next six months or so I may be able to return to some operating system design projects I had been working on. Certain people won't leave me alone so apparently I am going to have to prove the project can not possibly be done (or I'll succeed with the project, proving me wrong). We are still wanting to compile with a BSD-licensed PowerPC compiler. If I give you various pieces of assembly level PowerPC code, are you able to write a parser for the intermediate stages of ACK? I find it interesting that POWER6 has moved to an in-order architecture. This may make some of the concerns about a stack-based compiler less of an issue. On the other hand, PASemi's really cool chip is pushing out-of-order execution even farther. In any event, I wanted to get back in touch and open up a dialog again about writing a PowerPC compiler (backend) layer for ACK. thanks, tim Gregory T. (tim) Kelly Owner Dialectronics.com P.O. Box 606 Newberry, SC 29108 "Anything war can do, peace can do better." -- Bishop Desmond Tutu |
From: David G. <dg...@co...> - 2007-05-01 09:43:57
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gerald Murray wrote: > Hello, > Using ack-6.0pre3. A broken link is created during the make. [...] > /tmp/ack-temp/staging/lib/i80/descr -> TOPSOURCE/lib/i80/descr <--brok= en Yes, indeed. Ta. That file's actually no longer used, so the error should be harmless --- = I moved the descr file from the architecture-specific directory (mach/i80) = to the platform-specific one (plat/cpm) and then forgot to remove the line t= hat tries to install it. There's also a bug in the build system that it allow= s you to install nonexistent files, but that's another matter... Fixed. - -- =E2=94=8C=E2=94=80=E2=94=80 =EF=BD=84=EF=BD=87=EF=BC=A0=EF=BD=83=EF=BD=8F= =EF=BD=97=EF=BD=8C=EF=BD=81=EF=BD=92=EF=BD=8B=EF=BC=8E=EF=BD=83=EF=BD=8F=EF= =BD=8D =E2=94=80=E2=94=80=E2=94=80 http://www.cowlark.com =E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80 =E2=94=82 "This is the captain. We have a little problem with our reentry= sequence, =E2=94=82 so we may experience some slight turbulence and then explode." = --- Mal =E2=94=82 Reynolds, _Serenity_ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGNwvIf9E0noFvlzgRAv0vAJ9PG6kF4eCl+C/++I+whb0jPAYXcgCfWBEu NE7n29CWf/OOLk+ZSmxxco0=3D =3Dg59+ -----END PGP SIGNATURE----- |