In testing PR46235 I noticed some minor inefficiency in the usage of an extra register. The C code is: int foo(int a, int x, int y) { if (a & (16)) return a; return 1; } Which produces the asm: movl %edi, %eax movl $1, %edx testb $16, %al cmove %edx, %eax ret The above code could have been further optimized to remove the usage of edx: movl $1, %eax test $16, %edi cmove %edi, %eax ret
I probably meant "testb $16, %dil" above... GCC 4.3.5 avoids the usage of edx, although it too probably has 1 instruction too many: testb $16, %dil movl $1, %eax cmove %eax, %edi movl %edi, %eax ret
Changing bug title to include regression, as 4.3.5 is able to avoid the usage of edx.
Removing the regression tag. The gcc-4.5 and trunk code, are in effect equivalent. The seemingly unnecessary use of %edx doesn't affect anything and is merely an artifact of it appearing earlier in the register file -- it has the same costs as other call-clobbered regs and the allocator will use the first one appearing in the register file when multiple regs have the same cost. Note that it is true that the code could be improved. This just isn't a regression. There's two paths to get the desired code. The first would involve some surgery to the register allocator. The key here is the pseudo holding "a". The cost analysis has %eax and %edi as having the same cost. Allocating into %eax allows the copy to the return value to be eliminated while allocating into %edi allows the copy from the incoming argument to be deleted. What the allocator fails to realize is that the cmov can be used to eliminate the copy to the return value. Thus it's actually better to hold "a" in %edi. Fixing the allocator to realize this (and in fact use the cmov to eliminate the trailing mov) would probably be a PITA. The other approach is to let combine combine the cmov & move to the return register, even though the latter is a move to a likely-spilled hardreg. We generally avoid such combinations as to avoid lengthening the lifetime of a likely spilled hardreg, but in this kind of situation, no lengthening would occur. A quick hack as proof of concept results in: testb $16, %dil # 36 *testqi_1_maybe_si/2 [length = 4] movl $1, %eax # 34 *movsi_internal/1 [length = 5] cmovne %edi, %eax # 19 *movsicc_noc/1 [length = 3] ret # 39 return_internal [length = 1]
I'd hoped it would be possible to define a set of conditions under which combine could combine the conditional move with the subsequent move into a hard register. Without special casing the instruction patterns, reading the constraints and the like, I doubt that's going to be possible. The other approach I see is to have copyrename avoid coalescing the temporaries with "a". That in turn prevents "a" from being live throughout the entire function and we get the desired register allocations and code.
As a quick test, would this be fixed by re-ordering the register file to move eax above edx? If so, then another possible fix to this would be to effectively re-run the RA pass multiple times, each time using a different register file, and then select the one that produces the "best" code and discard the other RA attempts. The register files would only differ in their sort order when register costs are equal. I am guessing that only a few such register files would be needed (in particular ones where the eax is shuffled around), rather than every single possible combination of sort orders (which would be prohibitive), so this doesn't necessarily have to impact the length of compilation by much. A metric would also be needed to be able to then select the "best" version of the compiled code - possibly using number of instructions and number of registers used?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/03/11 07:16, tony.poppleton at gmail dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47521 > > --- Comment #5 from Tony Poppleton <tony.poppleton at gmail dot com> 2011-02-03 14:16:01 UTC --- > As a quick test, would this be fixed by re-ordering the register file to move > eax above edx? It's like flipping a coin. Sometimes its heads, sometimes its tails, tailoring the compiler around the fact that for this code we got tails isn't an approach we generally take. There is no sound reason to change the ordering of the registers. > > If so, then another possible fix to this would be to effectively re-run the RA > pass multiple times, each time using a different register file, and then select > the one that produces the "best" code and discard the other RA attempts. I don't think we're likely to take this approach. As I mentioned earlier, using %edx is a complete and total red herring; ignore it. The real issue here is there's an unnecessary reg-reg copy. The unnecessary copy can be attributed to many factors (coalescing early, combine's inability to deal with hard regs well, reload's inability to handle hard regs well in certain circumstances, IRA not realizing the ranges of "a" can profitably be split, etc). Those are the avenues I would suggest someone wanting to fix this bug pursue. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNSrxYAAoJEBRtltQi2kC7QKgH/05w+yEifZXhs+H/ueGrUk7f Z+H60AE8iGt10Z2w90bQEWAABLk8yeucZi/4g3tpwcqEwbmWn3FSyRYKlebdtPyn hTuq1ysIbjiN6lSfECvBUApSiATOJkHUi9izvriiOye5Zhn0LUKVBRtFWBLwnE0E MejL8vqkaMowCjC0+OJxuEyYNNBZj/YleOVwpjIH3kRxyWOv5p0UA2omah1bvDGs a4a7bCemRB04LByZL30JyXx/wpa8JHcEsLSlM0Him/JiRx6VNMj37uXVQVVn4CfX JP0CzZq9+7tzeaiT36cfdWKX9kPTRObPPVbAuYhkJc4jgoxsPxlxB6REh5Hkx34= =YmpV -----END PGP SIGNATURE-----
This appears to be fixed with GCC 4.8.0 and flag -O2. The asm code produced is now exactly as Jeff said in comment #3: .file "test.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc testb $16, %dil movl $1, %eax cmovne %edi, %eax ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 4.8.0 20130316 (Red Hat 4.8.0-0.17)" .section .note.GNU-stack,"",@progbits
Manually confirmed Tony's results in c#7. Could easily be the switch to LRA which occurred in gcc-4.8.