From: Tim H. <tim...@co...> - 2004-07-13 22:43:17
|
Someone on python-list claimed that 486 floating point is not as horrible as I had thought, as long as you stick to the simple stuff. I poked around, and it seems grokable, so I'm going to attempt to fully psycoize pfloatobject. It'll take a while and may never get done, but it'll be probably be educational at least. I'm going to sketch out my initial strategy here; any feedback on the practicality of this approach is welcome. First off, as far as I can tell, I need to create the equivalent of integer_add, integer_mul, etc from codeobject.c for the floating point objects. For simplicity, I plan to start by having all of the compile time branches return NULL. In other words punting and letting the Python interpreter handle those cases. That'll be a horrible speed hit, but once I get the basic case going I think (hope anyway) that those'll be easy. This means I need to come up with an equivalent to BINARY_INSTR_ADD. Tracing back through the various defines leads to bininstrgrp. Very roughly, I expect that the modified version of that will look something like: DEFINEFN vinfo_t* float_binop(PsycoObject* po, int op, bool nonneg, vinfo_t* v1, vinfo_t* v2) { reg_t rg; BEGIN_CODE NEED_CC(); // Maybe replace with clearing FP flags. // Emit appropriate machine code for +-*/ based on op here END_CODE if ( /*floating point error flag set*/) return NULL; /* if overflow */ return new_rtvinfo(po, rg, false, nonneg); } Does that sound plausible? -tim |
From: Tim H. <tim...@co...> - 2004-07-14 23:10:57
|
Some progress and some questions. I've made pretty good progress in roughing out the relevant assembly code and it's pretty much done. Well, except for testing, so maybe half done. The two problematic bits are getting data in and out of the function. The whole function is below, although it's still half psuedocode. I may have a solution for the problem of getting the floating point (FP) data into the function. I use LOAD_REG_FROM, to load the address of v->array into a register. I then add iFLOAT_OB_FVAL, and use the resulting address to push the FP value onto the FP stack. Like so: LOAD_REG_FROM(v1->array, rg); emit: add(iFLOAT_OB_FVAL, rg); emit: fld(rg); // Note: needs to be the 64 bit fld Does that make sense? I'm unclear on what all magic that LOAD_REG_FROM performs, so I'm guessing a bit at it's proper use. The second problem, getting the data out of the function looks harder. Currently I'm leaving the resulting FP value on the stack. The question is how to turn that into a vinfo_t and return it. It seems that I might need to build a new vinfo_t from scratch and leave its address in rg . That seems hard to do from scratch in assembler. Is there a convenience functions/macros that would help with this? I think I need to do the equivalent of PsycoFloat_FROM_DOUBLE in assembler, replacing vdouble1 and vdouble2 with the first two values in the stack. However, if I do need to build a vinfo_t from scratch, I could store the FP value directly to the array. Here's PsycoFloat_FROM_DOUBLE: inline vinfo_t* PsycoFloat_FROM_DOUBLE(vinfo_t* vdouble1, vinfo_t* vdouble2) { vinfo_t* result = vinfo_new(VirtualTime_New(&psyco_computed_float)); result->array = array_new(FLOAT_TOTAL); result->array->items[iOB_TYPE] = vinfo_new(CompileTime_New((long)(&PyFloat_Type))); result->array->items[iFLOAT_OB_FVAL+0] = vdouble1; result->array->items[iFLOAT_OB_FVAL+1] = vdouble2; return result; } Any helpful suggestions, particularly on this last issue would be greatly appreciated. I'm still pretty fuzzy on how the whole code emitting bit works. Is the return value for a function implicitly in rg when the emitted code is called? If so, I will probably have to recreate PsycoFloat_FROM_DOUBLE in assembly. Ouch. Anyway, I'm hoping you'll tell me that it's easy Thanks, -tim DEFINEFN vinfo_t* float_binop(PsycoObject* po, int op, bool nonneg, vinfo_t* v1, vinfo_t* v2) { reg_t rg; psyco_assert(0 <= op && op < 4); BEGIN_CODE NEED_CC(); NEED_FREE_REG(rg); emit: fclex(); // Clear exception bits // Push the two float values onto the FP stack. First // we grab the address of v->array then add iFLOAT_OB_FVAL // to get the address of the float for fld. LOAD_REG_FROM(v1->array, rg); emit: add(iFLOAT_OB_FVAL, rg); emit: fld(rg); // Note: needs to be the 64 bit fld LOAD_REG_FROM(v2->array, rg); emit: add(iFLOAT_OB_FVAL, rg); emit: fld(rg); // Note: needs to be the 64 bit fld // Perform operation We use faddP, fsubP, etc., so that // the leftover operand is popped and only the result // is left on the FP stack. switch (op) { case 0: emit: faddp(); case 1: emit: fsubp(); case 2: emit: fmulp(); case 3: emit: fdivp(); } // Check for error. We copy the FP status register into ax and test // that there are no(*) errors. // // (*) We check for invalid(0), divide-by-zero(2) and overflow(3). This // matches PyFPE_PROTECT. emit: fstsw(ax); emit: test(ax, 0x15); emit: jnz(end); // No errors. Push the result onto the stack. We do this manually, // since as far as I know there is now push (float) instruction. emit: sub(8, ESP); emit: fst(ESP); // Pop data off stack by way of cleaning up. emit: end: fstp( st0 ); END_CODE // Return NULL if NE (NZ) set. This will have been set, // if appropriate, by the test instruction. if (runtime_condition_f(po, CC_NE)) return NULL; // OK, here's the part I don't know how to do; This currently grabs the // value from the register. Instead I need it to create a new float value // based on the floating point value at the top of the stack. return new_rtvinfo(po, rg, false, nonneg); // WRONG! } |
From: Tim H. <tim...@co...> - 2004-07-15 01:38:55
|
Replying to myself one more time: After thinking about it some more, I decided it probably wasn't worth the trouble trying to recreate PsycoFloat_FROM_DOUBLE in assembly, any speed increase would probably be negligible. Instead, I just call PsycoFloat_FROM_DOUBLE. This is particularly attractive since I'm already putting the double result onto the stack. The two parts that are still shaky are the cleanup after CALL_C_FUNCTION and the return value. I think that I should be cleaning up the stack after CALL_C_FUNCTION, but it doesn't seem to happen in psyco_generic_call, so I could all wet. The other issue is what to return. At the end of the function I have a vinfo_t* in rg. This is what I need to return, but the number is not directly in rg, and to get it, it looks like I need to REG_NUMBER. Does that fly? I still haven't worked out how the emitted code returns values, so I'm not sure if I need to do more on that front yet as well. That's it for now. Probably my next step is to try turn all the emit psuedocode into real instructions. -tim ------------------------------------------------------------------------------------------------- DEFINEFN vinfo_t* float_binop(PsycoObject* po, int op, bool nonneg, vinfo_t* v1, vinfo_t* v2) { reg_t rg; psyco_assert(0 <= op && op < 4); BEGIN_CODE NEED_CC(); NEED_FREE_REG(rg); emit: fclex(); // Clear exception bits // Push the two float values onto the FP stack. First // we grab the address of v->array then add iFLOAT_OB_FVAL // to get the address of the float for fld. LOAD_REG_FROM(v1->array, rg); emit: add(iFLOAT_OB_FVAL, rg); emit: fld(rg); // Note: needs to be the 64 bit fld LOAD_REG_FROM(v2->array, rg); emit: add(iFLOAT_OB_FVAL, rg); emit: fld(rg); // Note: needs to be the 64 bit fld // Perform operation We use faddP, fsubP, etc., so that // the leftover operand is popped and only the result // is left on the FP stack. switch (op) { case 0: emit: faddp(); case 1: emit: fsubp(); case 2: emit: fmulp(); case 3: emit: fdivp(); } // Check for error. We copy the FP status register into ax and test // that there are no(*) errors. // // (*) We check for invalid(0), divide-by-zero(2) and overflow(3). This // matches PyFPE_PROTECT. emit: fstsw(ax); emit: test(ax, 0x15); emit: jnz(end); // No errors. Push the result onto the stack. We do this manually, // since as far as I know there is now push (float) instruction. emit: sub(8, ESP); emit: fst(ESP); po->stack_depth += 8; // mimic CALL_SET_ARG_ // Call PsycoFloat_FROM_DOUBLE to convert to a double. CALL_C_FUNCTION(PsycoFloat_FROM_DOUBLE, 1); emit: move(rg, eax); // Put result back into rg. // Clean? emit: add(8, ESP); po->stack_depth -= 8; // Pop data off FP stack by way of cleaning up. emit: end: fstp( st0 ); END_CODE // Return NULL if NE (NZ) set. This will have been set, // if appropriate, by the test instruction. if (runtime_condition_f(po, CC_NE)) return NULL; // A pointer to the new vinfo_t is sitting in rg; return it. return REG_NUMBER(po, rg); } |
From: Armin R. <ar...@tu...> - 2004-07-23 13:14:32
|
Hello Tim, Oops, sorry for not having answered more quickly. I'm kind of in holidays. You are making a mistake in all your e-mails: you are assuming that the assembly code deals with pointers to vinfo_t. It never does. vinfo_t and PsycoObject are two structures that only exist during compilation. They are just like structures in a compiler that gives information like "this local variable will be stored in register EAX during execution". In the C code of Psyco, most functions have a PsycObject* as their first argument and vinfo_t* as other input arguments and return value. The purpose of all these functions is to generate some assembly code. The assembly code reads the register or stack positions that were described by the vinfo_t, and puts the result somewhere; the original C function returns a new vinfo_t that describes where the result will be put. Example: vinfo_t* pfloat_add(PsycoObject* po, vinfo_t* v, vinfo_t* w) generates assembly code that assumes that there is a PyObject* at the place described by 'v' and another one at the place described by 'w', reads their ob_fval fields (maybe performing a convertion to PyFloatObject), and adds these two doubles. After this assembly code has been generated by pfloat_add, the latter returns a vinfo_t* that says "the result will be a virtual PyFloatObject whose ob_fval field is here in the stack". So functions like pfloat_add(), PsycoFloat_FROM_DOUBLE(), etc. all deals with PsycoObject, vinfo_t, and these ->array->items[STRANGE_CONSTANT_NAME] which are how a vinfo_t is built as pseudo-structures with an array of fields which are vinfo_ts too. The generated assembler code doesn't know about any of this. Occasionally, this generated assembler code calls a C function because it's more convenient to do it this way than generate the whole content of the C function ourselves. All the cimpl_*() functions are like this. They are just standard C function that don't know about PsycoObject or vinfo_t either. So I cannot really make sense of your code: > LOAD_REG_FROM(v1->array, rg); > emit: add(iFLOAT_OB_FVAL, rg); > emit: fld(rg); // Note: needs to be the 64 bit fld I think that it confuses the two levels: the compilation and the generated code. Let me try to explain how pfloat_add() works. Currently in pfloatobject.c: all the manipulation of in-heap PyFloatObjects is removed from the assembler code, but individual operations between doubles is done by calling the cimpl_*() helpers. We could optimize first by generating the equivalent of the cimpl_*() body directly into the assembler code (I think it's what you're trying to do). This can be done by copying the way integer_add() is done, but the small problem is that doubles require 2x32bits, so all arguments and return value must be duplicated; in other words, you must send twice as many input arguments and arrange to return two vinfo_t's. This is probably what caused the confusion from pfloat_add(): currently, in the call to psyco_generic_call(), this is done by passing 4 input args (a1, a2 are from v and b1, b2 are from w) and a vinfo_array_t of size 2 where psyco_generic_call() can put the two output vinfo_t's. The explicit manipulation of a vinfo_array_t here is *only* so that psyco_generic_call can have, so to say, two return values. Note that there is a better solution, which is to make a new abstraction for 'a double' just like vinfo_t is 'a 32-bit value'. A vinfo_t can mean 'the 32-bit value in register XXX' or 'the 32-bit value in the stack at position XXX'. It cannot refer to more than one 32-bit value (apart from as a pointer to some structure), and it cannot refer to a FP value in the FP stack. So just by doing what I described above we'd get code that still has to move the FP value from the regular stack to the FP stack, perform the operation, and move the result back to the regular stack where a pair of vinfo_ts can describe its location. This is not easy because it involves changing a couple of obscure places... A bientot, Armin. |