This is a performance/size regression between 4.6 (4.5) and 4.4. The C code: int foo(long long a) { if (a & (long long) 0x400) return 1; return 0; } Assemble code generated by 4.6 trunk: foo: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movq %rdi, -8(%rbp) movq -8(%rbp), %rax andl $1024, %eax testq %rax, %rax je .L2 movl $1, %eax jmp .L3 .L2: movl $0, %eax .L3: popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc Assemble code generated by 4.4.0: foo: .LFB0: .cfi_startproc shrq $10, %rdi movl %edi, %eax andl $1, %eax ret .cfi_endproc After tree optimizations, both compilers produce different but essentially same forms. RTL expander and later passes then go on to do different optimizations and generate very different code.
It works for me.
Sorry, I first observed this on our target. Then I tried to reproduce on x86, but I forgot to turn on optimization flags. It does work for x86. Please delete this report. I will figure out what happen with my target.
I found I can reproduce the bug with ARM ARM trunk -Os: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r2, #1024 mov r3, #0 and r2, r2, r0 and r3, r3, r1 orrs r1, r2, r3 moveq r0, #0 movne r0, #1 mov pc, lr .size foo, .-foo Arm 4.40 -Os: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r0, r0, lsr #10 and r0, r0, #1 bx lr
(In reply to comment #3) > I found I can reproduce the bug with ARM I see this too on armv5tel-linux-gnueabi. gcc-4.4.4 -Os generates 3 instructions for the body of foo(), 4.5.1 and 4.6.0 generate 8 instructions.
The code size regression on ARM is caused by r146817, Matz' expand from SSA patch: http://gcc.gnu.org/ml/gcc-cvs/2009-04/msg01459.html Here's the diff in the assembly code generated by a cross to armv5tel-linux-gnueabi, with -Os -S, for r146816 and r146817: --- pr45316.s-r146816 2010-08-26 23:00:18.000000000 +0200 +++ pr45316.s-r146817 2010-08-26 23:03:24.000000000 +0200 @@ -17,8 +17,13 @@ @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. - mov r0, r0, lsr #10 - and r0, r0, #1 + mov r2, #1024 + mov r3, #0 + and r2, r2, r0 + and r3, r3, r1 + orrs r1, r2, r3 + moveq r0, #0 + movne r0, #1 bx lr .size foo, .-foo .ident "GCC: (GNU) 4.5.0 20090426 (experimental)" My cross was configured: ../gcc-4.5-r146817/configure --target=armv5tel-unknown-linux-gnueabi --with-arch=armv5te --with-tune=xscale --disable-plugin --disable-lto --disable-nls --disable-shared --disable-libmudflap --disable-multilib --enable-threads=posix --enable-checking=release --enable-languages=c
GCC 4.5.2 is being released, adjusting target milestone.
GCC 4.5.3 is being released, adjusting target milestone.
I am currently on vacation until 4/5/2011. I may access my mail irregularly. Cheers, Bingfeng Mei
We produce better code with the trunk now: xorl %eax, %eax testl $1024, %edi setne %al ret But that is still worse than before with the and ...
/* If this is an equality or inequality test of a single bit, we can do this by shifting the bit being tested to the low-order bit and masking the result with the constant 1. If the condition was EQ, we xor it with 1. This does not require an scc insn and is faster than an scc insn even if we have it. The code to make this transformation was moved into fold_single_bit_test, so we just call into the folder and expand its result. */ if ((code == NE || code == EQ) && TREE_CODE (arg0) == BIT_AND_EXPR && integer_zerop (arg1) && integer_pow2p (TREE_OPERAND (arg0, 1)) && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type))) This needs to be updated for the out-of-ssa expand.
I have a patch: Index: expr.c =================================================================== --- expr.c (revision 182017) +++ expr.c (working copy) @@ -10563,15 +10563,24 @@ do_store_flag (sepops ops, rtx target, e so we just call into the folder and expand its result. */ if ((code == NE || code == EQ) - && TREE_CODE (arg0) == BIT_AND_EXPR && integer_zerop (arg1) - && integer_pow2p (TREE_OPERAND (arg0, 1)) + && TREE_CODE (arg0) == SSA_NAME && integer_zerop (arg1) && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type))) { - tree type = lang_hooks.types.type_for_mode (mode, unsignedp); - return expand_expr (fold_single_bit_test (loc, - code == NE ? NE_EXPR : EQ_EXPR, - arg0, arg1, type), - target, VOIDmode, EXPAND_NORMAL); + gimple srcstmt = get_gimple_for_ssa_name (arg0); + if (srcstmt + && TREE_CODE (gimple_assign_rhs_code (srcstmt)) == BIT_AND_EXPR + && integer_pow2p (gimple_assign_rhs2 (srcstmt)) + { + enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR; + tree type = lang_hooks.types.type_for_mode (mode, unsignedp); + tree temp = fold_build2 (loc, BIT_AND_EXPR, type, + gimple_assign_rhs1 (srcstmt), + gimple_assign_rhs2 (srcstmt)); + return expand_expr (fold_single_bit_test (loc, + tcode, + arg0, arg1, type), + target, VOIDmode, EXPAND_NORMAL); + } } if (! get_subtarget (target)
(In reply to comment #11) > I have a patch: With a couple of fixes above that.
Author: pinskia Date: Wed Dec 7 19:23:10 2011 New Revision: 182084 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182084 Log: 2011-12-07 Andrew Pinski <apinski@cavium.com> PR middle-end/45416 * expr.c (do_store_flag): Rewrite code that looks for BIT_AND_EXPR for SSA-expand. 2011-12-07 Andrew Pinski <apinski@cavium.com> PR middle-end/45416 * gcc.dg/pr45416.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/pr45416.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
Fixed on the trunk. If someone wants to backport the patch, that is ok with me.
Appears that the patch leads to code bloat on AVR at least. It is general wrong to assume that it's cheap to perform x >> n without caring for the underlying architecture.
Author: ramana Date: Mon Jan 23 17:59:51 2012 New Revision: 183446 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183446 Log: 2012-01-23 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> PR middle-end/45416 * gcc.dg/pr45416.c: Skip if Thumb1. Handle ubfx. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/pr45416.c
fixed. Not a wrong-code bug, so not backporting.