From: Torsten J. <t....@gm...> - 2012-06-12 19:55:08
Attachments:
greedy2frame_fix.diff
|
Hi again. I knew this migration thing was a bad idea. No, honestly. 1.2.2 does not compile on my box. I guess this is because I run a 32 bit system, and asm () runs out of general registers. This patch fixes comilation. Still untested, though, since I dont know how to activate the plugin at runtime. Torsten -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de |
From: Roland S. <rsc...@hi...> - 2012-06-12 23:20:00
Attachments:
greedy2framesse2_4regs.diff
|
Am 12.06.2012 21:54, schrieb Torsten Jager: > Hi again. > > I knew this migration thing was a bad idea. > No, honestly. > > 1.2.2 does not compile on my box. I guess this is because > I run a 32 bit system, and asm () runs out of general > registers. I think you're using a slightly broken compiler version or some unusual compile options - the code should only use 5 regs which is "typically" possible (it is 5 and not 6 regs because GreedyTwoFrameThreshold128 should only use a constant offset not a reg unless you're trying something silly like 32bit PIC code maybe or some other circumstances could cause gcc to need an additional reg). I know that Petri tested this. Unfortunately it is difficult to tell how many regs gcc will manage to handle for inline asm. To avoid such problems in general one solution might be to use small inline asm test programs in configure to just configure out the code which needs more (ffmpeg is doing something like that though but interestingly enough only for 6 or 7 regs they assume 5 is always available) - at least personally I wouldn't really care if some optimized code just isn't available in such broken setups. > > This patch fixes comilation. Still untested, though, > since I dont know how to activate the plugin at runtime. I think it looks ok but I'd prefer a simpler solution. I hate those multiple passing of the same arguments, these certainly should be no-ops in sane setups but it still makes the code just harder to read. So instead of splitting the code up I just moved the M1 fetch a bit down - while it was up there for a reason I never really managed to measure a performance difference anyway. That should need only 4 general regs just like your version which should REALLY be doable. (Tested on 64bit - not that you'd run into these register allocation problems there...) |