On void bar (long int *); void foo (void) { long int buf[10]; __builtin_memset (buf, 8, sizeof (buf)); bar (buf); } fwprop effectively undoes what CSE optimized. 0x0808080808080808 is not a valid constant for movdi_1_rex64 when the target is memory (movdi_1_rex64 only allows non-32-bit-sign-extended immediates in source if the destination is a register).
With 4.2 and earlier, we get: movabsq $578721382704613384, %rax movq %rsp, %rdi movq %rax, (%rsp) movq %rax, 8(%rsp) movq %rax, 16(%rsp) movq %rax, 24(%rsp) movq %rax, 32(%rsp) movq %rax, 40(%rsp) movq %rax, 48(%rsp) movq %rax, 56(%rsp) movq %rax, 64(%rsp) movq %rax, 72(%rsp) call bar with the trunk: movabsq $578721382704613384, %rax movabsq $578721382704613384, %r11 movq %rax, (%rsp) movabsq $578721382704613384, %rax movq %rsp, %rdi movq %rax, 8(%rsp) movabsq $578721382704613384, %r10 movabsq $578721382704613384, %r9 movabsq $578721382704613384, %r8 movabsq $578721382704613384, %rsi movabsq $578721382704613384, %rcx movabsq $578721382704613384, %rdx movabsq $578721382704613384, %rax movq %r11, 16(%rsp) movq %r10, 24(%rsp) movq %r9, 32(%rsp) movq %r8, 40(%rsp) movq %rsi, 48(%rsp) movq %rcx, 56(%rsp) movq %rdx, 64(%rsp) movq %rax, 72(%rsp) call bar which is far worse. The predicate on the source operand on movdi_1_rex64 allows it (general_operand), but the constraints don't (but letting forwprop to try all constraints to see what is and is not desirable would be weird). Alternatively, perhaps forwprop could try to emit_move_insn and see if it doesn't create more insns than the one it is substing into, if more than one, it should just punt, otherwise it can just subst into it.
fwprop should check costs just like combine does. Unfortunately the cost do need a little bit of tweaking even if one implements the idea.
Created attachment 14495 [details] patch that fixes the bug I'm not sure about the correctness of the i386.c hunk. The problem is that a cost of 0 is mapped by insn_rtx_cost to COSTS_N_INSNS(1) == 4.
BTW, why don't you use just rtx_cost instead of insn_rtx_cost? In each case you have an insn, so you can do single_set on it and run rtx_cost (SET_SRC (set), SET) on it directly.
Subject: Re: [4.3 Regression] Pessimization caused by fwprop > BTW, why don't you use just rtx_cost instead of insn_rtx_cost? > In each case you have an insn, so you can do single_set on it and run > rtx_cost (SET_SRC (set), SET) on it directly. You're right. I was just mimicking combine to see if the problem was simple to solve or more deeply rooted.
Created attachment 14499 [details] patch using rtx_cost This patch uses rtx_cost instead of insn_rtx_cost, which is faster and does not require touching i386.c.
Created attachment 14502 [details] gcc43-pr34012.patch Updated patch with testcase. Paolo, are you bootstrapping/regtesting this or should I? I can do that on x86_64-linux, ppc64-linux and ia64-linux overnight.
Subject: Re: [4.3 Regression] Pessimization caused by fwprop jakub at gcc dot gnu dot org wrote: > ------- Comment #7 from jakub at gcc dot gnu dot org 2007-11-07 20:18 ------- > Created an attachment (id=14502) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14502&action=view) > gcc43-pr34012.patch > > Updated patch with testcase. Paolo, are you bootstrapping/regtesting this or > should I? I can do that on x86_64-linux, ppc64-linux and ia64-linux overnight. I could have done it only on Monday and only for x86_64 (CPU time for the local x86_64 is "reserved" by another guy till then...) so yes, it's much better if you do it. Thanks! Paolo
Bootstrapped/regtested on x86_64-linux, ppc64-linux (-m64 default) and ia64-linux. No regressions. Could you please send it to gcc-patches?
Subject: Bug 34012 Author: jakub Date: Fri Nov 9 13:02:25 2007 New Revision: 130043 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=130043 Log: PR rtl-optimization/34012 * fwprop.c (try_fwprop_subst): Do not replace if the new SET_SRC has a higher cost than the old one. * gcc.target/i386/pr34012.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr34012.c Modified: trunk/gcc/ChangeLog trunk/gcc/fwprop.c trunk/gcc/testsuite/ChangeLog
Fixed.