I have little source : start ---> typedef struct { float real; float imag; } complex_t; extern void fft_asmb_3dnow (int k, complex_t * x, complex_t * wTB, const complex_t * d, const complex_t * d_3) { register complex_t *x2k, *x3k, *x4k, *wB; { __asm__ __volatile__ ("movq %4, %%mm0\n\t" "movq %5, %%mm1\n\t" "movq %%mm0, %%mm5\n\t" "pfadd %%mm1, %%mm5\n\t" "pxor %%mm6, %%mm0\n\t" "pxor %%mm7, %%mm1\n\t" "pfadd %%mm1, %%mm0\n\t" "movq %%mm0, %%mm4\n\t" "pswapd %%mm4, %%mm4\n\t" "movq %6, %%mm0\n\t" "movq %7, %%mm2\n\t" "movq %%mm0, %%mm1\n\t" "movq %%mm2, %%mm3\n\t" "pfadd %%mm5, %%mm0\n\t" "pfadd %%mm4, %%mm2\n\t" "movq %%mm0, %0\n\t" "pfsub %%mm5, %%mm1\n\t" "movq %%mm2, %3\n\t" "pfsub %%mm4, %%mm3\n\t" "movq %%mm1, %1\n\t" "movq %%mm3, %2":"=m"(x[0]), "=m"(x3k[0]), "=m"(x2k[0]), "=m"(x4k[0]):"m"(wTB[0]), "m" (wTB[k * 2]), "m" (x[0]), "m" (x2k[0]):"memory"); }; } end -------> if I add "-O0" option for gcc 3.2.3 or 3.3.1 then compiler say "NO" like this: "error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'" but if I add "-O3" then compile and code work fine. What is wrong compiler or my asm code?
Created attachment 4229 [details] compressed archive containing two files in attachment I add compressed archive containing two files. One of them is err.log and second is my source code.
Confirmed with 3.2.x, 3.3 and mainline W.
Maybe neither. *sigh* i386 is infamously register-starved, and there may simply not be enough registers unless optimizations are used.
Any chance this might be related to bug 9929? A patch for that is pending (http://gcc.gnu.org/ml/gcc-patches/2003-07/msg02582.html), so someone could try...
*** Bug 13410 has been marked as a duplicate of this bug. ***
*** Bug 14090 has been marked as a duplicate of this bug. ***
Current version gcc (3.3.4 20040331) cann't compile sample code only with command $gcc -O0 -c regs_test.c but can do it with line $gcc -O0 -fnew-ra -c regs_test.c and $gcc -Os -c regs_test.c
*** Bug 13850 has been marked as a duplicate of this bug. ***
confirmed with 3.4.2-20040806 (-O0 works, -O[123] fails). ps). building qemu-0.5.5 also fails. pentium3-pld-linux-gcc -O2 -march=pentium3 --save-temps -fomit-frame-pointer -mpreferred-stack-boundary=2 -falign-functions=0 -fno-reorder-blocks -fno-optimize-sibling-calls -I. -I/home/users/pluto/rpm/BUILD/qemu-0.5.5/target-i386 -I/home/users/pluto/rpm/BUILD/qemu-0.5.5 -D_GNU_SOURCE -c -o op.o /home/users/pluto/rpm/BUILD/qemu-0.5.5/target-i386/op.c /home/users/pluto/rpm/BUILD/qemu-0.5.5/target-i386/ops_template_mem.h: In function `op_rolb_kernel_T0_T1_cc': /home/users/pluto/rpm/BUILD/qemu-0.5.5/softmmu_header.h:179: error: can't find a register in class `GENERAL_REGS' while reloading `asm' static inline void glue(glue(st, SUFFIX), MEMSUFFIX)(void *ptr, RES_TYPE v) { asm volatile ("movl %0, %%edx\n" / * line 179 */ "movl %0, %%eax\n" "shrl %3, %%edx\n" "andl %4, %%eax\n" "andl %2, %%edx\n" "leal %5(%%edx, %%ebp), %%edx\n" "cmpl (%%edx), %%eax\n" "movl %0, %%eax\n" "je 1f\n" #if DATA_SIZE == 1 "movzbl %b1, %%edx\n" #elif DATA_SIZE == 2 "movzwl %w1, %%edx\n" #elif DATA_SIZE == 4 "movl %1, %%edx\n" #else #error unsupported size #endif "pushl %6\n" "call %7\n" "popl %%eax\n" "jmp 2f\n" "1:\n" "addl 4(%%edx), %%eax\n" #if DATA_SIZE == 1 "movb %b1, (%%eax)\n" #elif DATA_SIZE == 2 "movw %w1, (%%eax)\n" #elif DATA_SIZE == 4 "movl %1, (%%eax)\n" #else #error unsupported size #endif "2:\n" : : "r" (ptr), /* NOTE: 'q' would be needed as constraint, but we could not use it with T1 ! */ "r" (v), "i" ((CPU_TLB_SIZE - 1) << 3), "i" (TARGET_PAGE_BITS - 3), "i" (TARGET_PAGE_MASK | (DATA_SIZE - 1)), "m" (*(uint32_t *)offsetof(CPUState, tlb_write[CPU_MEM_INDEX][0].address)), "i" (CPU_MEM_INDEX), "m" (*(uint8_t *)&glue(glue(__st, SUFFIX), MMUSUFFIX)) : "%eax", "%ecx", "%edx", "memory", "cc"); }
*** Bug 17291 has been marked as a duplicate of this bug. ***
Reference to other bug-reports: http://bugs.gentoo.org/show_bug.cgi?id=71360
Why do people write inline-asm like this? It is crazy to do so. Split up the inline-asm correctly. Anyone who writes like inline-asm should get what they get. For mmx inline-asm, you should be using the intrinsics instead as suggested before or just write real asm file.
(In reply to comment #12) > Why do people write inline-asm like this? why not? its valid code and a compiler should compile valid code ... > It is crazy to do so. Split up the inline-asm correctly. fix gcc first so it doesnt load&store more then needed between the splited up parts > Anyone who writes like inline-asm should get what they get. > For mmx inline-asm, you should be using the intrinsics instead as suggested before lets see why its not using intrinsics * it was written before intrinsics support was common * intrinsics fail / get misscompiled commonly, its so bad that some of the altivec intrinsic code has been disabled in ffmpeg if standard gcc is detected, there also have been very serious and similar problems in mplayer with altivec-intrinsics, sadly i cant provide more details as i dont have a ppc * many if not most of the mplayer developers still use gcc 2.95 because gcc 3.* is slower and needs more memory, and AFAIK 2.95 doesnt support intrinsics * it is alot of work to rewrite and debug it just to make it compileable with gcc -O0 > or just write real asm file. thats not a good idea either as: * its slower due to the additional call/ret/parameter passing * there are some symbol name mangling issues on some obscure systems (see mplayer-dev or cvslog mailinglist, it was disscussed there a long time ago)
You've just constrained the compiler too much to do anything. You're right that gcc should produce fewer loads and stores sometimes, but in this case I suggest you show that this actually hurts you still with GCC 4.0, I would hope it does better. In any case, just because code is syntactically "valid" GNU C doesn't mean gcc can always compile it. With this kind of inline asm, you're bound to confuse the register allocator. The fact that it works at O3 is pure luck and not a bug. Note that you're hitting an *error*, not an ICE. It is a deliberate choice to inform you that GCC cannot compile your inline assembly. Bad luck for you.
I will note for the record that disabling local-alloc will resolve this problem. A patch for that is in the audit trail of another bug, for unrelated reasons: http://gcc.gnu.org/PR13776. It also happens to fix the particular problem in this bug report.
*** Bug 19549 has been marked as a duplicate of this bug. ***
(In reply to comment #15) > I will note for the record that disabling local-alloc will resolve > this problem. A patch for that is in the audit trail of another bug, > for unrelated reasons: http://gcc.gnu.org/PR13776. It also happens > to fix the particular problem in this bug report. I didn't test the source proposed in this bugreport, but the patch mentioned above (disabling of local-alloc) DOES NOT resolve the problem with the testcode proposed in bugreport http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19549, and, though, it also doesn't fix the problem of compiling ffmpegs libavcodec/i386/dsputil_mmx.c, because that is the original, from which the testcode was extracted/simplified. So, either it is not the same bug (as marked by Andrew) or the problem was not resolved. And IMHO this shoul be perfectly valid, since the operands to the asm construction are all marked as "m" (!!!), so no registers should be needed for that! They are just memory operands!! And so I think this bug (or at least http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19549) should NOT be marked as resolved.
(In reply to comment #17) > And IMHO this shoul be perfectly > valid, since the operands to the asm construction are all marked as "m" (!!!), > so no registers should be needed for that! Huh? The memory operands are not at a compile time constant address, so of course you need a register to hold them. Of course, you need only one register for all of them, but you explicitely disallowed gcc to discover that by specifying -O0.
(In reply to comment #18) > Huh? The memory operands are not at a compile time constant address, so of course > you need a register to hold them. Of course, you need only one register for > all of them, but you explicitely disallowed gcc to discover that by specifying > -O0. Sure, one, sorry. But problem is the Bug 19549 testcode doesn't compile AT ALL. I.e., not only with -O0, but also with -O1, -O2, or -O3. It simply doesn't compile under any circumstances.
OK, sorry, the Bug 19549 testcode passes with -O1 and above, but the original, that it was stripped from (maybe too much stripped) doesn't: -- test2.c ------------------------------------- extern const unsigned char ff_h263_loop_filter_strength[32]; static const unsigned long long ff_pb_FC __attribute__((used)) __attribute__ ((aligned(8))) = 0xFCFCFCFCFCFCFCFCULL; void h263_h_loop_filter_mmx(unsigned char *src, int stride, int qscale){ const int strength= ff_h263_loop_filter_strength[qscale]; unsigned long long temp[4] __attribute__ ((aligned(8))); unsigned char *btemp= (unsigned char *)temp; src -= 2; asm volatile("" : "+m" (temp[0]), "+m" (temp[1]), "+m" (temp[2]), "+m" (temp[3]) : "g" (2*strength), "m"(ff_pb_FC) ); asm volatile("" : "=m" (*(unsigned int*)(src + 0*stride)), "=m" (*(unsigned int*)(src + 1*stride)), "=m" (*(unsigned int*)(src + 2*stride)), "=m" (*(unsigned int*)(src + 3*stride)), "=m" (*(unsigned int*)(src + 4*stride)), "=m" (*(unsigned int*)(src + 5*stride)), "=m" (*(unsigned int*)(src + 6*stride)), "=m" (*(unsigned int*)(src + 7*stride)) ); } ------------------------------------------------ Or do you consider this also invalid?
(In reply to comment #21) > Or do you consider this also invalid? It doesn't seem invalid to me. But it is basically impossible to write the register allocator such that it finds a register allocation for every situation where it's theoretically possible. So this is unlikely to get fixed in a reliable way.
(In reply to comment #22) > It doesn't seem invalid to me. But it is basically impossible to write the > register allocator such that it finds a register allocation for every situation > where it's theoretically possible. So this is unlikely to get fixed in a > reliable way. OK, I guess I fixed the code in the ffmpeg to help gcc in the compilation a bit (I hope it will be accepted). So consider the above code rather as another code for testing, if occasionally, sometimes the problem gets resolved.
Martin, you should realize that this problem *cannot* be solved. Yes, there will perhaps be a time when this particular test case compiles, though I think that is unlikely. But anyway, then there will be other cases that fail. The reason is dead simple: register allocation is NP-complete, so it is even *theoretically* not possible to write register allocators that always find a coloring. That means any register allocator will always fail on some very constrained asm input. And you cannot allow it to run indefinitely until a coloring is found, because then you've turned the graph coloring problem into the halting problem because you can't prove that a coloring exists and that the register allocator algorithm will terminate. So really it doesn't matter at all whether or not your specific inline asm compiles or not. When yours does, someone else's will fail.
if you resolve all memory-referenses to temporary variables void *a=(src + 0*stride) and use those instead. Doesn't that lessen the stress the register-allocator is given?
(In reply to comment #14) > In any case, just because code is syntactically "valid" > GNU C doesn't mean gcc can always compile it. With this kind of inline asm, > you're bound to confuse the register allocator. The fact that it works at O3 > is pure luck and not a bug. well, you are the gcc developers so theres not much arguing about what you consider valid, but last time i checked the docs did not mention that asm statemts may fail to compile at random, and IMO as long as this is not clearly stated in the docs this bugreport really shouldnt be marked as invalid, say you dont want to fix it, say it would be too complicated to fix or whatever but its not invalid > Note that you're hitting an *error*, not an ICE. no, at least one of the bugreports marked as duplicate of this ends in an ICE (In reply to comment #24) > Martin, you should realize that this problem *cannot* be solved. Yes, > there will perhaps be a time when this particular test case compiles, > though I think that is unlikely. But anyway, then there will be other > cases that fail. hmm, so the probelm cannot be solved but then maybe it will be solved but this doesnt count because there will be other unrelated bugs? i cant follow this reasoning or do u mean that u can never solve all bugs and so theres no need to fix any single one? > > The reason is dead simple: register allocation is NP-complete, so it > is even *theoretically* not possible to write register allocators that > always find a coloring. register allocation in general is NP-complete, yes, but it seems u forget that this is about finding the optimal solution while gcc fails finding any solution which in practice is a matter of assigning the registers beginning from the most constrained operands to the least, and copying a few things on the stack if gcc cant figure out howto access them, sure this method might fail in 0.001% of the practical cases and need a 2nd or 3rd pass where it tries different registers it might also happen that in some intentionally overconstrained cases it ends up searching the whole 5040 possible assignments of 7 registers onto 7 non memory operands but still it wont fail > That means any register allocator will always > fail on some very constrained asm input. now that statement is just false, not to mention irrelevant as none of these asm statemets are unreasonably constrained > And you cannot allow it to > run indefinitely until a coloring is found, because then you've turned > the graph coloring problem into the halting problem because you can't > prove that a coloring exists and that the register allocator algorithm > will terminate. this is ridiculous, the number of possible colorings is finite, u can always try them all in finite time
Subject: Re: source doesn't compile with -O0 but they compile with -O3 > > >> >> The reason is dead simple: register allocation is NP-complete, so it >> is even *theoretically* not possible to write register allocators that >> always find a coloring. > > register allocation in general is NP-complete, yes, but it seems u forget that > this is about finding the optimal solution while gcc fails finding any solution > which in practice is a matter of assigning the registers beginning from the most > constrained operands to the least, and copying a few things on the stack if gcc > cant figure out howto access them, sure this method might fail in 0.001% of the > practical cases and need a 2nd or 3rd pass where it tries different registers > it might also happen that in some intentionally overconstrained cases it ends up > searching the whole 5040 possible assignments of 7 registers onto 7 non memory > operands but still it wont fail Just to also point out, it doesn't appear to be NP complete for register interference graphs, because they all seem to be 1-perfect. Various papers have observed this, and i've actually compiled all of gcc, libstdc++, etc, and every package ever on my computer, and not once has a single non-1-perfect interference graph occurred [my compiler would abort if it was true]. On 1-perfect graphs you can solve this problem in O(time it takes to determine the max clique), and there already exists a polynomial time algorithm for max-clique on perfect graphs. > >> That means any register allocator will always >> fail on some very constrained asm input. > > now that statement is just false, not to mention irrelevant as none of these asm > statemets are unreasonably constrained You are correct, NP completeness does not imply impossiblity. There are only a finite number of possibilities. > > >> And you cannot allow it to >> run indefinitely until a coloring is found, because then you've turned >> the graph coloring problem into the halting problem because you can't >> prove that a coloring exists and that the register allocator algorithm >> will terminate. > > this is ridiculous, the number of possible colorings is finite, u can always try > them all in finite time You are right, he is wrong.
Subject: Re: source doesn't compile with -O0 but they compile with -O3 Yeah, fine battle!
*** Bug 20645 has been marked as a duplicate of this bug. ***
*** Bug 23743 has been marked as a duplicate of this bug. ***
*** Bug 25226 has been marked as a duplicate of this bug. ***
*** Bug 25221 has been marked as a duplicate of this bug. ***
*** Bug 25853 has been marked as a duplicate of this bug. ***
> The reason is dead simple: register allocation is NP-complete, so it > is even *theoretically* not possible to write register allocators that > always find a coloring. Not at all. If a problem is NP-hard, you can in fact solve it! It is just quite likely that your algortihm takes exponentiallly many steps in the size of the problem. Which, given the few registers of x86 might turn out not to be a problem. > That means any register allocator will always > fail on some very constrained asm input. And you cannot allow it to > run indefinitely until a coloring is found, because then you've turned > the graph coloring problem into the halting problem because you can't > prove that a coloring exists and that the register allocator algorithm > will terminate. Not necessary. The coloring problem is decidable (just enumerate all the colorings aka. register mappings), whereas the halting problem is not decidable (or semi-decidable if you're intrested in that) > So really it doesn't matter at all whether or not your specific inline > asm compiles or not. When yours does, someone else's will fail. Nope.
(In reply to comment #34) > > The reason is dead simple: register allocation is NP-complete, so it > > is even *theoretically* not possible to write register allocators that > > always find a coloring. > > Not at all. If a problem is NP-hard, you can in fact solve it! It is just quite > likely that your algortihm takes exponentiallly many steps in the size of the > problem. Which, given the few registers of x86 might turn out not to be a > problem. > > > That means any register allocator will always > > fail on some very constrained asm input. And you cannot allow it to > > run indefinitely until a coloring is found, because then you've turned > > the graph coloring problem into the halting problem because you can't > > prove that a coloring exists and that the register allocator algorithm > > will terminate. > > Not necessary. The coloring problem is decidable (just enumerate all the > colorings aka. register mappings), whereas the halting problem is not decidable > (or semi-decidable if you're intrested in that) > > > So really it doesn't matter at all whether or not your specific inline > > asm compiles or not. When yours does, someone else's will fail. > > Nope. > Sorry for the spam. Didn't read up to the end. Have been quite angry with the whole situation....
(In reply to comment #21) > asm volatile("" > : "=m" (*(unsigned int*)(src + 0*stride)), > "=m" (*(unsigned int*)(src + 1*stride)), > "=m" (*(unsigned int*)(src + 2*stride)), > "=m" (*(unsigned int*)(src + 3*stride)), > "=m" (*(unsigned int*)(src + 4*stride)), > "=m" (*(unsigned int*)(src + 5*stride)), > "=m" (*(unsigned int*)(src + 6*stride)), > "=m" (*(unsigned int*)(src + 7*stride)) > ); (In reply to comment #26) > it might also happen that in some intentionally overconstrained cases it ends up > searching the whole 5040 possible assignments of 7 registers onto 7 non memory > operands but still it wont fail The example Martin gave has *8* operands. You can try every possible direct mapping of those 8 addresses to just 7 registers, but they will obviously all fail. Except with ia32 addressing modes it _can_ be done, and with only 4 registers. reg1 = src, reg2 = stride, reg3 = src+stride*3, reg4 = src+stride*6 Then the 8 memory operands are: (reg1), (reg1,reg2,1), (reg1,reg2,2), (reg3), (reg1,reg2,4), (reg3,reg2,2), (reg4), (reg3,reg2,4) When one considers all the addressing modes, there are not just 7 possible registers, but (I think) 261 possible addresses. There are not just 5040 possibilities as Michael said, but over 76 x 10^15 possible ways of assigning these addresses to 7 operands! Then each register can be loaded not just with an address but with some sub-expression too, like how I loaded reg2 with stride. Even for ia32, which makes up for its limited number of registers with complex addressing modes, finding a register allocation that satisfies an asm statement is not something that can always be done in reasonable time. If the number of operands <= number of available registers it should be able to (but gcc doesn't) always find an allocation (_an_ allocation, not the best allocation).
(In reply to comment #36) > (In reply to comment #21) > > asm volatile("" > > : "=m" (*(unsigned int*)(src + 0*stride)), > > "=m" (*(unsigned int*)(src + 1*stride)), > > "=m" (*(unsigned int*)(src + 2*stride)), > > "=m" (*(unsigned int*)(src + 3*stride)), > > "=m" (*(unsigned int*)(src + 4*stride)), > > "=m" (*(unsigned int*)(src + 5*stride)), > > "=m" (*(unsigned int*)(src + 6*stride)), > > "=m" (*(unsigned int*)(src + 7*stride)) > > ); > > (In reply to comment #26) > > it might also happen that in some intentionally overconstrained cases it ends up > > searching the whole 5040 possible assignments of 7 registers onto 7 non memory > > operands but still it wont fail > > The example Martin gave has *8* operands. You can try every possible direct > mapping of those 8 addresses to just 7 registers, but they will obviously all > fail. Except with ia32 addressing modes it _can_ be done, and with only 4 > registers. > > reg1 = src, reg2 = stride, reg3 = src+stride*3, reg4 = src+stride*6 > Then the 8 memory operands are: > (reg1), (reg1,reg2,1), (reg1,reg2,2), (reg3), > (reg1,reg2,4), (reg3,reg2,2), (reg4), (reg3,reg2,4) > > When one considers all the addressing modes, there are not just 7 possible > registers, but (I think) 261 possible addresses. There are not just 5040 > possibilities as Michael said, but over 76 x 10^15 possible ways of assigning > these addresses to 7 operands! Then each register can be loaded not just with > an address but with some sub-expression too, like how I loaded reg2 with > stride. "m" operands and variations can be copied onto the stack and accessed from there, so no matter how many memory operands there are they can always be accessed over esp on ia32, so whatever you did calculate it is meaningless now if there is a unwritten rule that "m" operands and variations of them cannot be copied anywhere, then it would be very desireable to have a asm constraint like "m" without this restriction this would resolve this and several other bugs also it would be very nice if such a dont copy restriction on "m" if it does exist could be documented
(In reply to comment #37) > now if there is a unwritten rule that "m" operands and variations of them > cannot be copied anywhere, then it would be very desireable to have a asm > constraint like "m" without this restriction this would resolve this and > several other bugs > also it would be very nice if such a dont copy restriction on "m" if it does > exist could be documented Copying "m" operands onto the stack might not be such a great thing to wish for. Imagine if you used asm("movaps %xmm0, %0": "=m"(x[i])); If x[i] is only 32-bits, and gcc copied it onto the stack, then writing 16 bytes with movaps wouldn't also write to x[i+1] to x[i+3] as intended. I know there is a plenty of asm code in ffmpeg that overwrites or overreads memory operands and will fail if gcc tried to move them onto the stack. There is also alignment. movaps requires an aligned address, and maybe you have chosen x and i in such a way that it will be aligned. But when gcc copies the value onto the stack, how is it supposed to know what alignment it needs?
(In reply to comment #38) > (In reply to comment #37) > > now if there is a unwritten rule that "m" operands and variations of them > > cannot be copied anywhere, then it would be very desireable to have a asm > > constraint like "m" without this restriction this would resolve this and > > several other bugs > > also it would be very nice if such a dont copy restriction on "m" if it does > > exist could be documented > > Copying "m" operands onto the stack might not be such a great thing to wish > for. Imagine if you used asm("movaps %xmm0, %0": "=m"(x[i])); If x[i] is only > 32-bits, and gcc copied it onto the stack, then writing 16 bytes with movaps > wouldn't also write to x[i+1] to x[i+3] as intended. I know there is a plenty > of asm code in ffmpeg that overwrites or overreads memory operands and will > fail if gcc tried to move them onto the stack. There is also alignment. > movaps requires an aligned address, and maybe you have chosen x and i in such a > way that it will be aligned. But when gcc copies the value onto the stack, how > is it supposed to know what alignment it needs? well the data type used in "m"() must of course be correct, that is here a 128bit type, alignment can be handled like with all other types, double also gets aligned if the architecture needs it, so a uint128_t or sse128 or whatever can as well, the example you show is a fairly obscure special case in respect to moving "m" to the stack, in the end theres a need for a "m" like constraint which must not be moveable and a "m" like constraint which should be moveable (to the stack for example) the exact letters used are irrelevant
Linked from http://x264dev.multimedia.cx/?p=185, I'd forgotten all about the ridiculous flamewar in this one. Just as a note, the actual definitions of the four variables (from liba52): x2k = x + 2 * k; x3k = x2k + 2 * k; x4k = x3k + 2 * k; wB = wTB + 2 * k; Also, the asm inputs are silly - output 0 is the same as input 6 for no reason, and the same with output 2 and input 7. So change those to "+m" and change %6/%7 to %0/%2. That doesn't actually change anything, even though it should free two registers. It works with gcc 4.5 -O0 -fno-pic -fomit-frame-pointer, but not without one of those flags. Looks like that's because it's allocating 2 more registers for the unused fake inputs for the "+m" - change it to "=m" and it works with one flag removed, but still not both. So there's a specific bug. And of course it all works at -O1 because it doesn't have to use registers there. So maybe it should just do that.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Marked for reference. Resolved as fixed @bugzilla.