Combine cannot combine a shift and arithmetic insn into one insn in ARM target, because there is a function call between them. Intelligent change in the evaluation order would enable Combine algorithm to produce better code. In the example the ior expression has two operands (shift expr. and function call). Release: gcc version 3.3 20030217 (prerelease) Environment: BUILD & HOST: Linux 2.4.20 i686 unknown TARGET: arm-unknown-elf How-To-Repeat: gcc -Os -S 01.c // 01.c int func2(int d) { return 12*d; } int func(int d) { return 23*func2(d); } int main() { int u; u = 0; u = (u << 8) | func(7); u = (u << 8) | func(8); return u; }
Hello, with gcc 3.3, I get the following code: .file "01.i" .text .align 2 .global func2 .type func2, %function func2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r0, r0, r0, asl #1 mov r0, r0, asl #2 @ lr needed for prologue mov pc, lr .size func2, .-func2 .align 2 .global func .type func, %function func: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 bl func2 add r3, r0, r0, asl #1 rsb r0, r0, r3, asl #3 ldmea fp, {fp, sp, pc} .size func, .-func .align 2 .global main .type main, %function main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 mov ip, sp stmfd sp!, {r4, fp, ip, lr, pc} mov r0, #7 sub fp, ip, #4 bl func mov r4, r0 mov r0, #8 mov r4, r4, asl r0 bl func orr r4, r4, r0 mov r0, r4 ldmea fp, {r4, fp, sp, pc} .size main, .-main .ident "GCC: (GNU) 3.3 20030508 (prerelease)" with gcc mainline (20030508) I get: .file "01.i" .text .align 2 .global func2 .type func2, %function func2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r0, r0, r0, asl #1 mov r0, r0, asl #2 @ lr needed for prologue mov pc, lr .size func2, .-func2 .align 2 .global func .type func, %function func: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 bl func2 add r3, r0, r0, asl #1 rsb r0, r0, r3, asl #3 ldmea fp, {fp, sp, pc} .size func, .-func .align 2 .global main .type main, %function main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 mov ip, sp stmfd sp!, {r4, fp, ip, lr, pc} mov r0, #7 sub fp, ip, #4 bl func mov r4, r0, asl #8 mov r0, #8 bl func orr r0, r4, r0 ldmea fp, {r4, fp, sp, pc} .size main, .-main .ident "GCC: (GNU) 3.4 20030508 (experimental)" Can you determine whether this is an improvement, or whether the original problem still exists? Thanks, Dara
See Dara's question.
Created attachment 4151 [details] Shows the problem on mainline (20030530)
>Can you determine whether this is an improvement, or whether the original >problem still exists? The problem is still present on mainline (20030530), but no optimization can be done for this example. I attached another example which shows the problem clearly. Anyway, We wrote down our idea about a possible solution. See: http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01886.html Regards, Gabor Loki
Thanks for the feedback Gabor. Hopefully it won't be too long until some solution is adopted...
Suspending as this is fixed on the tree-ssa: int T.3; int T.2; <bb 0>: T.2 = func (7); T.3 = func (8); return T.2 << 8 | T.3;
Fixed for 3.5.0 by the merge of the tree-ssa.