sablevm-user Mailing List for SableVM (Page 6)
Brought to you by:
egagnon
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(30) |
Aug
(13) |
Sep
|
Oct
|
Nov
(11) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
(5) |
Dec
(2) |
2002 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(18) |
Sep
|
Oct
|
Nov
|
Dec
(5) |
2003 |
Jan
(12) |
Feb
(7) |
Mar
(19) |
Apr
(1) |
May
(5) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
(12) |
Apr
(16) |
May
(6) |
Jun
(6) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jim J. <res...@ve...> - 2001-10-11 23:23:23
|
e -g -O2 -c acos.c -fPIC -DPIC -o .libs/acos.lo acos.c: In function `Java_java_lang_Math_acos': acos.c:108: Unable to find a register to spill in class `AREG'. acos.c:108: confused by earlier errors, bailing out make[2]: *** [acos.lo] Error 1 Am running RH Linux 7.1 with gcc 2.96 on an i386. Any suggestions? The same error occurs attempting to complie sablepath-libs-0.1.3 |
From: Ian R. <ir...@cs...> - 2001-08-27 10:26:17
|
Hi, I'm Ian or my SourceForge user name is captain5050. I've been working on a Java dynamic compiler for the last 3 years as part of my Ph.D. I hope to be using sablepath as the native library for running some SpecJVM benchmarks. As a student my work is owned by the University and they have a reponsibility (at Manchester University anyway) that I should benefit from any invention of mine. The right to benefit from my inventions has been taken away from me by a company in Manchester called Transitive Technologies Ltd (TTL). I have confirmed with both the University and TTL that when contributing patches/additions to a project I own the copyright. I am also clean in the sense that I'm not a part of Sun's community source program or any other JVM and class library project. In essence, to the best of my knowledge, I can work on this project without any copyright infringement issues. I look forward to working with you, Ian Rogers |
From: Etienne M. G. <eg...@j-...> - 2000-11-30 21:12:45
|
xli wrote: > Do you have any clue how the SUN team implements this? Their source code license is too restrictive for me to look at. In JITs or in compiled code (e.g. HotSpot, Jalapeno), this is probably not a problem. But for interpreters (JDK interpreter, HotSpot interpreter) this is a problem. I expect them to do as I suggested, as interpreters, by definition, do not rewrite bytecode, and maintaining longs/doubles separately is pretty fastidious, unless I missed some implementation trick. One such trick would be to detect non-aligned longs/doubles in the verification step, and reserve space for them explicitly, replacing the stack operations by special ones. But this is tricky in presence of untyped stack instructions (e.g. dup2_x2). > I tried to find other open source JVM, but failed. try: http://www.kaffe.org/ http://www.japhar.org/ http://www.intel.com/research/mrl/orp/ http://www.openjit.org/ http://latte.snu.ac.kr/ http://sources.redhat.com/java/ Etienne PS: To register to the SableVM-user mailing list, you can visit: http://lists.sourceforge.net/mailman/listinfo/sablevm-user -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableCC: http://www.sable.mcgill.ca/sablecc/ and SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-11-30 20:49:59
|
xli wrote: > >union _svmt_stack_frame_entry_union > >{ > > jint jint; > > jfloat jfloat; > > void *addr; > >#ifdef DOUBLE_ALIGNMENT > > jdouble dummy_jdouble; > >#endif > >}; > > > It sounds OK, but the consumption of stack will be doubled. Yes, but other than rewriting the bytecode so that no odd stack location is used for doubles/longs, I see no solution which won't hurt either performance or memory consuption. (Or alternatively, reject such bytecode, but this would be in violation of the JVM specification). The uglier problem here, is that we must also change the object layout offset assignment code for long/double, so that alignment is (conditionally) provided. @#$%*&!@#$*&%! (this is what I think;-) Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableCC: http://www.sable.mcgill.ca/sablecc/ and SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-11-30 20:28:22
|
xli wrote: > > Thank you for your reply. > > > On a 64 bit platform, sizeof(void *) will be 64 bits. All we have to > > do, now, is make sure the remaining of the stack is aligned (which I > > have to double check). > > I think you are right for a 64 bit platform. However, I tried your idea > on a SPARC architecture which is a 32 bit platform but requires 64 bit > allignment for doubles (longs). For example, the following code: > > #define allignment() {if ((int)(p)&0x00000007) p++;} > > main(){ > int *p; > double d1, d2, d3; > > d1 = 123.5; > d2 = 45.7892; > > p = (int*) malloc(20*sizeof(void*)); > > p++; > > // allignment(); > > *((double*) p) = d1; > > p += 2; > > *((double*) p) = d2; > > d3 = *(double*) p + *(double*) (p-2); > > printf(" %f + %f = %f \n", d1, d2, d3); > } > > will cause a bus error if the allignment macro is ommitted. > I didn't know about that. In this case, there's an ugly solution (maybe not that ugly): union _svmt_stack_frame_entry_union { jint jint; jfloat jfloat; void *addr; #ifdef DOUBLE_ALIGNMENT jdouble dummy_jdouble; #endif }; > I also tried the above code on my PC, it works because ix86 architecture > does not require such kind of allignment (I guess). > > >This is an important point to check. I will need a lot of feedback from > >people like you to make sure SableVM is easily portable to various > >systems with a minimal effort. > > This is also an important point for me, as I am now working on a > Prolog virtual machine which should be portable to various platforms. > > Cheers. > > Li -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableCC: http://www.sable.mcgill.ca/sablecc/ and SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-11-30 19:48:39
|
Hi there! I am currently working on SableVM, completing the base functionality of the interpreter (to support all bytecodes, and exceptions), and cleaning up a few things (nostly related to portability and flexibility). I need some help with the GNU configure magic. There are a few system properties that I can deduce using simple C programs running on the target platform. Other things, I do not know how exactly to automate them. But mostly, I do not know how to get GNU configure to run these programs and extract the results and put them into config.h. Here is an example of a property I can deduce easily with a program: support for labels as values (e.g. "&&label"). It the program compiles, it is supported, if not, it's not. Something harder to deduce: what is the name of the signed 32 bit integer type? (is it "signed int", "signed long", "signed short", ...) Anybody knows about this stuff? [I have no experience with M4, and writing configure macros,...] Thanks in advance, Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableCC: http://www.sable.mcgill.ca/sablecc/ and SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-11-30 19:33:26
|
Hi Mr. Li, [I would rather have this discussion on the sablevm-user mailing list, as it gets automatically archived. So, I am CC'ing my reply there.] See answer below: xli wrote: > ... > I have a question about 64-bit alignment for double and long values. > For example, when we load a double value, such as 0.0, to the stack, > the following code will be executed: > > DCONST_0: > { > *((jdouble *) &stack[frame->stack_size]) = 0.0; > frame->stack_size += 2; > } > goto *((frame->pc++)->addr); > > How do we know that the current stack address is 64-bit alligned? It will be probably aligned, because of the definition of "stack": _svmt_stack_frame_entry *volatile stack = NULL; where typedef union _svmt_stack_frame_entry_union _svmt_stack_frame_entry; and union _svmt_stack_frame_entry_union { jint jint; jfloat jfloat; void *addr; }; On a 64 bit platform, sizeof(void *) will be 64 bits. All we have to do, now, is make sure the remaining of the stack is aligned (which I have to double check). This is an important point to check. I will need a lot of feedback from people like you to make sure SableVM is easily portable to various systems with a minimal effort. -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableCC: http://www.sable.mcgill.ca/sablecc/ and SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-11-27 01:22:40
|
John Leuner wrote: > > My first comment is on the use of a threaded interpreter. As I understand > the bytecode array will no longer be one-byte instructions followed by > their operands, but instead will consist of 4 byte pointers to the > implementation for that instruction. > > Isn't this terribly space inefficient? Wouldn't this require 64bits per > pointer on a 64-bit machine? I say this in light of the pressure put on > memory by the JIT and optimising process, this would make the pressure on > memory even worse. The pressure on the memory is = sizeof(void *) X bytecode buffer size This is greater on a 64 bit platform. But presumably, a 64 bit platform has plenty of memory, and this is likely to be much less than the JIT compiled size of bytecode. Some of this space could be saved using simple comptression techniques, but you should also take into account the importance of data alignment in memory. If you misalign data, your programs are likely to suffer from the overhead of extracting it from memory. On some RISC platforms, this overhead can be significant. I think that a compromize, on 64 bit platforms, is to group bytecode parameters within words, if possible (as you can fit 2 Java "int"s into a single wors). But, addresses are 64 bits, so every bytecode will be effectively represented by a whole word. > You also say "The advantage of unmapping memory, relative to simply > freeing memory (using malloc/free) is that unmapped pages need not be > dumped to disk by the VMM system when they are to be ejected from RAM". > > I don't really understand this line? Do you mean that instead of > malloc'ing a 5M heap (which will be regarded by the VMM as useful data), > you can selectively allocate the pieces being used by the heap? I have had discussions about this with some people. It might not be such a good idea, after all, as (1) unmapping memory is an overhead if no memory is being swapped to disk, and (2) if the VMM starts swaping memory to disk, then it is already too late, i.e. your application becomes dead slow. So, the basic principle here is to keep a heap size that fits your RAM, in which case, unmapping is unnecessary. Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: John L. <je...@pi...> - 2000-11-27 00:15:42
|
Hello Etienne It's been a while since I spent time on my JIT and JVM, I've been working on porting the JVM to run in the linux kernel, in preparation for an experimental Operating System. But I read your paper and would like to comment / ask questions. I found many of the ideas very stimulating I would love to work with their implementation at some stage (either in SableVM or my JVM / JIT). My first comment is on the use of a threaded interpreter. As I understand the bytecode array will no longer be one-byte instructions followed by their operands, but instead will consist of 4 byte pointers to the implementation for that instruction. Isn't this terribly space inefficient? Wouldn't this require 64bits per pointer on a 64-bit machine? I say this in light of the pressure put on memory by the JIT and optimising process, this would make the pressure on memory even worse. You also say "The advantage of unmapping memory, relative to simply freeing memory (using malloc/free) is that unmapped pages need not be dumped to disk by the VMM system when they are to be ejected from RAM". I don't really understand this line? Do you mean that instead of malloc'ing a 5M heap (which will be regarded by the VMM as useful data), you can selectively allocate the pieces being used by the heap? John Leuner |
From: Etienne M. G. <eg...@j-...> - 2000-11-23 01:03:59
|
Hi Feng, "Etienne M. Gagnon" wrote: > For now, all I can do is try helping you resolve your compilation > problem. I assume you are doing a local user installation. Have you set your environment variables correctly for dynamic library loading? i.e. LD_LIBRARY_PATH, (and for gcc: LIBRARY_PATH, C_INCLUDE_PATH)? These little environment settings are a pain, this is why I would normally recommend asking to your system administrator to simply install the binary packages. But I know that your sysadmin is pretty busy these days... :-) So, just let me know if this helps. Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-11-22 23:20:54
|
Hi Feng. Feng QIAN wrote: > It is my first time to try SableVM, and I got some installation problems. > > First of all, it needs popt package, then I downloaded it. > Then 'popt' needs 'gettext', and I download it. > Now, for some reasons, 'gettext' is not installed correctly, but I will > look at it. Can you describe your problem more precisely? Which version of gettext are you trying to install? [Can't you simply installed precompiled packages? On a Debian system, these packages are installed using the command "apt-get install apt". On RedHat, these packages are part of the base install.] > At this stage, I think what about make things simple for the first time > users by providing packaged 'popt' and 'gettext', rather than spending > time to figure out how to setup them. Eventually, there will be binary packages for SableVM (at least for Debian). This will simplify the dependency issue. For now, all I can do is try helping you resolve your compilation problem. Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Feng Q. <fq...@cs...> - 2000-11-22 17:50:58
|
Hi, Etienne It is my first time to try SableVM, and I got some installation problems. First of all, it needs popt package, then I downloaded it. Then 'popt' needs 'gettext', and I download it. Now, for some reasons, 'gettext' is not installed correctly, but I will look at it. At this stage, I think what about make things simple for the first time users by providing packaged 'popt' and 'gettext', rather than spending time to figure out how to setup them. Regard ================================================== Feng Qian |
From: Etienne M. G. <eg...@j-...> - 2000-11-04 17:45:20
|
Hi! I have been quiet for a little while, as I was participating to OOPSLA 2000, then I spent some time writing about SableVM. A have made the resulting document public; it is the Sable Technical Report number 2000-3. You can find the postscript version of this report at: http://www.sable.mcgill.ca/publications/#report2000-3 If you prefer an html version, I have made one available on: http://www.j-meg.com/~egagnon/sable-report_2000-3/ The document explains the objectives and design of SableVM. Based on this document, it id now be possible to share tasks in the completion of the Core SableVM engine (if any of you is interested). Please let me know of any comment you have. Thanks, Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Marcel A. <ma...@ca...> - 2000-08-29 16:27:22
|
On Tue, Aug 29, 2000 at 11:06:52AM -0400, Etienne M. Gagnon wrote: > meg.com> <200...@ri...> <39A...@j-...> <200...@ar...> > Content-Type: text/plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > Sender: uucp <uu...@ar...> > > Marcel Ammerlaan wrote: > > Ok. I've retested it and the results are much better now for the goto case > > (around 3 times faster) with a real-life example (e.g. newtest1 && newtest2). > > Ah! This is much more in line with the material that can be found on > "threaded interpretation" the literature. > > > I will get my own VM up & running again and use that as a testbed instead > > of simple programs so the compiler won't fool me again:) > > > > I hate assembly and usually avoid it but in a case like this I should have > > checked what GCC did to the code... > > > > > I might still be wrong, so please continue testing and keep us updated > > > on your findings. > > > > I will (for now I'm focussing on the pre-interpretation bit as I've got > > better results when skipping this part and using a lookup table. I will > > investigate:) > > Do not forget that the pre-interpretation phase is a linear phase. > While running a real interpreter, you usually have many loops and > recursion that will make the pre-interpretation phase overhead pretty > insignificant. This phase does nothing as complex as a non-naive JIT > would do. I'm aware of that, I've measured running the code about 40.000 times (versus translation once), the difference between a real VM and the test code is the static nature of the list of instructions. I let you know what the results are (tonight or tomorrow) Marcel Ammerlaan |
From: Etienne M. G. <eg...@j-...> - 2000-08-29 15:07:02
|
Marcel Ammerlaan wrote: > Ok. I've retested it and the results are much better now for the goto case > (around 3 times faster) with a real-life example (e.g. newtest1 && newtest2). Ah! This is much more in line with the material that can be found on "threaded interpretation" the literature. > I will get my own VM up & running again and use that as a testbed instead > of simple programs so the compiler won't fool me again:) > > I hate assembly and usually avoid it but in a case like this I should have > checked what GCC did to the code... > > > I might still be wrong, so please continue testing and keep us updated > > on your findings. > > I will (for now I'm focussing on the pre-interpretation bit as I've got > better results when skipping this part and using a lookup table. I will > investigate:) Do not forget that the pre-interpretation phase is a linear phase. While running a real interpreter, you usually have many loops and recursion that will make the pre-interpretation phase overhead pretty insignificant. This phase does nothing as complex as a non-naive JIT would do. Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-08-29 14:04:41
|
Hi Marcel. Here's another set of benchmarks that you can use as a basis for your tests. It contains enough bytecodes so that gcc uses a table to encode the switch statement, yet it is small enough so that inspection the assembly code is simple. I have included the "gcc -O2 -S" assembly output. You will see how the switch based approach has significant overhead over the goto based one. If you do time measurements, you should replace the "DIV" opcode by something that compiles to less code. Maybe replace it by a "ADD7" opcode. Just make sure that gcc doesn't get too clever and optimize out your opcodes (which is unlikely to happen in a java interpreter). Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-08-29 13:12:53
|
Here's the promised attachment. Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Etienne M. G. <eg...@j-...> - 2000-08-29 13:10:04
|
Marcel Ammerlaan wrote: > I've experimented with pre-interpretation but it doesn't seem to matter > much (ie. same results). I modified your benchamrk to include time measurments (using the clock() function). My tests indicate that test3.c is a little faster than test.c. This means that, using these tests, the goto approach is faster, but just a little. This was compiled using the following gcc options (CFLAGS environment variable being unset): gcc -O2 -o test test.c gcc -O2 -o test3 test3.c Now, I used ddd to get a first feel of why the generated machine code for the switch was so efficient (to be so close to the goto approach), and it became much easier to understand what's happening... I have included, in attachent, the assembly code for two programs, generated with: gcc -O2 -S test.c gcc -O2 -S test3.c You will notice that gcc is quite clever (sometimes;-), and it has detected that your "case" statements are all alike! So, gcc took advantage of this. I have not checked carefully, but you should also be aware that gcc could detect that it does not need to do the assignment "var = i;", as "var" is not volatile, and only the last iteration has any real side effect... In other words, your while/switch is simple for gcc to optimize, and this optimization wouldn't happen in a real interpreter, as each bytecode has a different body (there's no point in having duplicate bytecodes in an interpreter). I'm sure that, if you test with a somewhat more complex example, you will soon discover that the goto approach is faster. Just look at the assembly code. What to look for: gcc does encode a test on the value of the switch, to test if it is in bound of its encoded array of destination addresses. Gcc has no (theoretical) basis to be able to soundly remove such test. Also, you (at least) get two branches for every iteration: (1) jump from the switch to the appropriate "case" (2) jump from the end of the "case" to the back to the loop head (possibly the "switch") This is already assuming an optimizing compiler that had removed spurious jumps (as you would probably get with gcc -O0). I might still be wrong, so please continue testing and keep us updated on your findings. Cheers, Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Marcel A. <ma...@ca...> - 2000-08-29 08:14:08
|
Etienne M. Gagnon wrote (eg...@j-...) > Hi! > > Marcel Ammerlaan wrote: > > Name: benchmark.tar.bz2 > > benchmark.tar.bz2 Type: unspecified type (application/octet-stream) > > Encoding: base64 > > > I've looked at your test2.c. You wrote: > > goto *code_addr[data[i]]; > > This should have been: > > goto *datap++; > > to correspond to your switch based test.c. > > Mainly, you forgot the bytecode->address pre-interpretation phase. > Mainly, you don't have bytecodes anymnore; each bytecode is directly > replaced by its address, which is why it is much quicker (no loop, no > bound check [included in a switch], etc.) I've experimented with pre-interpretation but it doesn't seem to matter much (ie. same results). A testing version of that code is in test2.c. I've included the new version (test3.c) which uses the same testdata.h. Maybe I'm wrong (could be, haven't slept) but I don't think using goto is going to be faster than the switch. As for the bounds check; I remember that C doesn't guarantee anything when there is no default label, aka. bounds check is optional. I'd liked to be proved wrong because using the goto table makes some things possible in a JVM that I'd like to try in my own VM. I've tried both gcc 2.95.2 and gcc 2.7.2.3 and both give comparable results. Marcel ps. The var=i statement in the interpreter loop becomes a NOP because 'i' isn't changed anymore. GCC doesn't seem to see this. -- Don't let people drive you crazy when you know it's in walking distance |
From: Etienne M. G. <eg...@j-...> - 2000-08-29 02:32:10
|
Hi! Marcel Ammerlaan wrote: > Name: benchmark.tar.bz2 > benchmark.tar.bz2 Type: unspecified type (application/octet-stream) > Encoding: base64 I've looked at your test2.c. You wrote: goto *code_addr[data[i]]; This should have been: goto *datap++; to correspond to your switch based test.c. Mainly, you forgot the bytecode->address pre-interpretation phase. Mainly, you don't have bytecodes anymnore; each bytecode is directly replaced by its address, which is why it is much quicker (no loop, no bound check [included in a switch], etc.) ... I feel much better already;-) Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Marcel A. <ma...@ca...> - 2000-08-29 01:40:44
|
Hi, Etienne M. Gagnon wrote (eg...@j-...) > Hi Marcel. > > Marcel Ammerlaan wrote: > > My question is: why use goto instead of switch()? > > The concept of using goto's in the interpreter loop is usually referred > to as "threaded" interpretation. This name originates, if I am right, > from implementations of the FORTH language. > > The short answer to your question is: because it's (usually) much > faster. But, your experiment suggests that this is not the case... I > would be very interested in looking at your small benchmark program, to > see if there's something wrong with it. Would it be possible to send a > copy of it on the list? (This shouldn't be a long program, I guess). > > The reason "labels as values" were added in gcc was explicitly for cases > like interpreter loops. Here's some excerpt from gcc's documentation: > > "Another use of label values is in an interpreter for threaded code. > The labels within the interpreter function can be stored in the threaded > code for super-fast dispatching." Yeah, I read that too, that's what made me wonder in the first place. Although a C-compiler could probably better optimize a plain switch() statement (it's a higher level description of the same thing). Here is the result from a test-run I've got from a switch() vs. goto implementation: rincewind:~/src/pleurvm$ gcc -O2 -o test test.c rincewind:~/src/pleurvm$ time ./test switch! real 0m6.798s user 0m6.770s sys 0m0.030s rincewind:~/src/pleurvm$ gcc -O2 -o test2 test2.c rincewind:~/src/pleurvm$ time ./test2 goto! real 0m27.194s user 0m27.020s sys 0m0.010s I've included the source code (it's just a little interpreter which executes some random noise from /dev/random, hence the benchmark size. Opcode 0 means exit the loop). Marcel Ammerlaan ps. does any one know an alternative to GCC, it seems to mishandle my own little JVM making it crash in the long run. I want to double check with another C compiler before jumping to conclusions. -- Don't let people drive you crazy when you know it's in walking distance |
From: Etienne M. G. <eg...@j-...> - 2000-08-29 00:35:54
|
Hi Marcel. I'm sorry for the delay; I had a pretty hectic day. Marcel Ammerlaan wrote: > My question is: why use goto instead of switch()? The concept of using goto's in the interpreter loop is usually referred to as "threaded" interpretation. This name originates, if I am right, from implementations of the FORTH language. The short answer to your question is: because it's (usually) much faster. But, your experiment suggests that this is not the case... I would be very interested in looking at your small benchmark program, to see if there's something wrong with it. Would it be possible to send a copy of it on the list? (This shouldn't be a long program, I guess). The reason "labels as values" were added in gcc was explicitly for cases like interpreter loops. Here's some excerpt from gcc's documentation: "Another use of label values is in an interpreter for threaded code. The labels within the interpreter function can be stored in the threaded code for super-fast dispatching." Cheers, Etienne -- ---------------------------------------------------------------------- Etienne M. Gagnon, M.Sc. e-mail: eg...@j-... Author of SableVM: http://www.sablevm.org/ ---------------------------------------------------------------------- |
From: Marcel A. <ma...@ca...> - 2000-08-28 08:05:26
|
Hello, I've been researching some clean-room JVM implementations (Kaffe, Japhar, SableVM) and have a question regarding the interpreter loop in SableVM. You use the GCC-extended goto mechanism to jump to the instruction implementations while the other two use a switch() statement. I did a very quick benchmark between these two methods and it seems the switch() based approach is much quicker than using goto with a lookup table. My question is: why use goto instead of switch()? Thanks in advance, Marcel Ammerlaan |
From: John L. <je...@pi...> - 2000-08-02 01:12:53
|
There are many jalapeno papers on this page: http://www.research.ibm.com/jalapeno/publication.html Mostly related to run-time optimisation. John Leuner |
From: John L. <je...@pi...> - 2000-08-01 06:26:44
|
> > I'm a bit sceptical about making the JIT compatible with a range of > > VMs (like the classic VM), because there are so many VM-specific > > optimisations to be made. But I love the idea of modularisation and > > flexibility. > > I think the essential idea is a framework where one can write a "plug > in" transformation that can be run on a number of VMs, if it is > something that can be performed at a high enough level. It is true that > those kinds of things alone are not enough to produce the best > attainable code for a particular environment, but that is not a problem > as a particular VM's JIT (hopefully) would know a few tricks about > optimizing for the local environment. Well there will be things that will be portable, but there will also be things that need to be customized for the VM. > >> I'm investigating if that status also applies to work with Java in > >> general. > > > > I'm not sure I understand the last line? > > My employer (NAI Labs) cannot claim "clean room" status and I need to > determine if there is a legal distinction between my role as an > employee of the company and my personal projects. So have you worked on tainted code yourself? > > I initially wanted to do a little bit of optimisation (such as register > > allocation), but was so depressed with the IA32's pitiful registers and > > different instructions for addressing the different sets of registers (MMX > > etc) that I decided to leave that for later. > > Yes, that is why it is much more fun to work with the Alphas, even > though producing code for them is actually harder (w.r.t. instruction > scheduling). I would love to get my hands on an Alpha. I'll have to start saving up for one. > I think most JITs are invoked indiscriminantly and are therefore > limited by the need for speed to variations on simple bytecode to > native code mapping. Etienne would like to see a VM invoke a JIT for > a particular method only if there is a clear benefit. I think that's > the only way to go if the JIT will be spending time aggressively > optimizing the result. Yes, but to do useful profiling it might be beneficial to do a minial translation first. > > > - Code straightening and jump threading. > > > > What is this? > > Code straightening and jump threading both have to do with improving > how branches are taken through the code. Code straightening takes > the code over what appears to be the most likely path of execution > through a procedure and arranges the corresponding basic blocks > sequentially -- so the most likely case is the fall through case. > This means that it is more likely than not that prefetched > instructions can be executed rather than discarded. Jump threading > looks at branch targets. For example, if the target of an > unconditional branch is another unconditional branch, then the > natural thing to do is make the first branch point to the second > branch's target. As another example, branches which point to the > very next instruction (commonly a result of other optimizations) can > be simply removed. Cases involving conditional branches are more > interesting. Would it be useful to actually record which branch is taken and use it for a later optimisation. Or is this too low-level. How does the CPU do branch-predicition? > > Yes, the whole issue of having the compiler substitute special code for > > marked methods is also something I'm very keen on. Specifically I want to > > avoid the overhead of native method calls for doing things like file IO > > etc. > > You should look at the work the Jaguar project has done in this area: > http://www.cs.berkeley.edu/~mdw/proj/jaguar/ Interesting. I'm thinking more along the lines of an interface to a general-purpose kernel (linux syscalls). One of my other projects is to create a OS based around the JVM as the kernel. The compiler will obviously help a lot with accessing raw hardware and compiling device drivers. (See www.jos.org). Recently I started compiling parts of kissme into the linux kernel as an experiment. I've also tried out the oskit, see cs.utah.edu, which is a project aimed at facilitating OS research. > > Yes, again there is a need for an initial translator which is very fast, > > but which will be superseded by a more advanced version at a later stage. > > Yes, or a VM with a fast interpreter which invokes the JIT > selectively. Well in my arrangement I currently first wait for the invokevirtual to be replaced by invokevirtual_quick before I think of optimising. But this is just a single method call. The problem is that it is probably common to have a thread like this: public void run() { while(true) { socket.accept(); service request; } } So how do you replace a running thread with native code? > > Remember that objects in the java heap move around. (I suppose you could > > pin these down). > > Hmm. Marking code object storage involatile is one option. Creatively > fixing up what code can't be made position independent as required > is another. I haven't thought too much about this yet, but I certainly > will be. :) Another issue I'm dealing with is that I want to make some of the compiled code persistent. (Ie resuable when the VM fires up again). But currently my native code stores direct pointers to things like JNIEnv environment pointers and to class structures. It would nice to be able to patch these when restarting the VM. (This will enable parts of the JVM to be rewritten in Java, something I'ld really like to do.) Of course it gets complicated when you start having absolute jumps to other functions etc. > > Yip, and if you're willing to bend the rules a bit you can also cache > > field accesses. I haven't looked much at what optimisation is possible > > from an OO point of view, but there is definitely a great potential for > > speedup in this area. > > Etienne worked in the past on a Java optimizer that did various OO > optimizations -- e.g. inlining, transforming virtual calls to static > ones, etc. (http://www.sable.mcgill.ca/soot/) I'm looking at this as > a starting point. I'll have a look at it too. John |