Bug 34012 - [4.3 Regression] Pessimization caused by fwprop
Summary: [4.3 Regression] Pessimization caused by fwprop
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.3.0
: P3 major
Target Milestone: 4.3.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-07 10:12 UTC by Jakub Jelinek
Modified: 2007-11-09 13:05 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-11-07 10:17:34


Attachments
patch that fixes the bug (833 bytes, patch)
2007-11-07 11:04 UTC, Paolo Bonzini
Details | Diff
patch using rtx_cost (837 bytes, patch)
2007-11-07 17:02 UTC, Paolo Bonzini
Details | Diff
gcc43-pr34012.patch (1.15 KB, patch)
2007-11-07 20:18 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2007-11-07 10:12:38 UTC
On
void bar (long int *);
void foo (void)
{
  long int buf[10];
  __builtin_memset (buf, 8, sizeof (buf));
  bar (buf);
}
fwprop effectively undoes what CSE optimized.  0x0808080808080808 is not a valid constant for movdi_1_rex64 when the target is memory (movdi_1_rex64 only allows non-32-bit-sign-extended immediates in source if the destination is a register).
Comment 1 Jakub Jelinek 2007-11-07 10:24:43 UTC
With 4.2 and earlier, we get:
        movabsq $578721382704613384, %rax
        movq    %rsp, %rdi
        movq    %rax, (%rsp)
        movq    %rax, 8(%rsp)
        movq    %rax, 16(%rsp)
        movq    %rax, 24(%rsp)
        movq    %rax, 32(%rsp)
        movq    %rax, 40(%rsp)
        movq    %rax, 48(%rsp)
        movq    %rax, 56(%rsp)
        movq    %rax, 64(%rsp)
        movq    %rax, 72(%rsp)
        call    bar
with the trunk:
        movabsq $578721382704613384, %rax
        movabsq $578721382704613384, %r11
        movq    %rax, (%rsp)
        movabsq $578721382704613384, %rax
        movq    %rsp, %rdi
        movq    %rax, 8(%rsp)
        movabsq $578721382704613384, %r10
        movabsq $578721382704613384, %r9
        movabsq $578721382704613384, %r8
        movabsq $578721382704613384, %rsi
        movabsq $578721382704613384, %rcx
        movabsq $578721382704613384, %rdx
        movabsq $578721382704613384, %rax
        movq    %r11, 16(%rsp)
        movq    %r10, 24(%rsp)
        movq    %r9, 32(%rsp)
        movq    %r8, 40(%rsp)
        movq    %rsi, 48(%rsp)
        movq    %rcx, 56(%rsp)
        movq    %rdx, 64(%rsp)
        movq    %rax, 72(%rsp)
        call    bar
which is far worse.  The predicate on the source operand on movdi_1_rex64 allows it (general_operand), but the constraints don't (but letting forwprop to try all constraints to see what is and is not desirable would be weird).  Alternatively, perhaps forwprop could try to emit_move_insn and see if it doesn't create more insns than the one it is substing into, if more than one, it should just punt, otherwise it can just subst into it.
Comment 2 Paolo Bonzini 2007-11-07 11:03:05 UTC
fwprop should check costs just like combine does.  Unfortunately the cost do need a little bit of tweaking even if one implements the idea.
Comment 3 Paolo Bonzini 2007-11-07 11:04:25 UTC
Created attachment 14495 [details]
patch that fixes the bug


I'm not sure about the correctness of the i386.c hunk.  The problem is that a cost of 0 is mapped by insn_rtx_cost to COSTS_N_INSNS(1) == 4.
Comment 4 Jakub Jelinek 2007-11-07 13:40:00 UTC
BTW, why don't you use just rtx_cost instead of insn_rtx_cost?
In each case you have an insn, so you can do single_set on it and run
rtx_cost (SET_SRC (set), SET) on it directly.
Comment 5 paolo.bonzini@lu.unisi.ch 2007-11-07 13:53:55 UTC
Subject: Re:  [4.3 Regression] Pessimization caused
 by fwprop

> BTW, why don't you use just rtx_cost instead of insn_rtx_cost?
> In each case you have an insn, so you can do single_set on it and run
> rtx_cost (SET_SRC (set), SET) on it directly.

You're right.  I was just mimicking combine to see if the problem was 
simple to solve or more deeply rooted.
Comment 6 Paolo Bonzini 2007-11-07 17:02:21 UTC
Created attachment 14499 [details]
patch using rtx_cost

This patch uses rtx_cost instead of insn_rtx_cost, which is faster and does not require touching i386.c.
Comment 7 Jakub Jelinek 2007-11-07 20:18:02 UTC
Created attachment 14502 [details]
gcc43-pr34012.patch

Updated patch with testcase.  Paolo, are you bootstrapping/regtesting this or should I?  I can do that on x86_64-linux, ppc64-linux and ia64-linux overnight.
Comment 8 paolo.bonzini@lu.unisi.ch 2007-11-08 06:10:21 UTC
Subject: Re:  [4.3 Regression] Pessimization caused
 by fwprop

jakub at gcc dot gnu dot org wrote:
> ------- Comment #7 from jakub at gcc dot gnu dot org  2007-11-07 20:18 -------
> Created an attachment (id=14502)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14502&action=view)
> gcc43-pr34012.patch
> 
> Updated patch with testcase.  Paolo, are you bootstrapping/regtesting this or
> should I?  I can do that on x86_64-linux, ppc64-linux and ia64-linux overnight.

I could have done it only on Monday and only for x86_64 (CPU time for 
the local x86_64 is "reserved" by another guy till then...) so yes, it's 
much better if you do it.  Thanks!

Paolo
Comment 9 Jakub Jelinek 2007-11-08 17:05:29 UTC
Bootstrapped/regtested on x86_64-linux, ppc64-linux (-m64 default) and ia64-linux.  No regressions.  Could you please send it to gcc-patches?
Comment 10 Jakub Jelinek 2007-11-09 13:03:16 UTC
Subject: Bug 34012

Author: jakub
Date: Fri Nov  9 13:02:25 2007
New Revision: 130043

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=130043
Log:
	PR rtl-optimization/34012
	* fwprop.c (try_fwprop_subst): Do not replace if the new
	SET_SRC has a higher cost than the old one.

	* gcc.target/i386/pr34012.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr34012.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fwprop.c
    trunk/gcc/testsuite/ChangeLog

Comment 11 Jakub Jelinek 2007-11-09 13:05:23 UTC
Fixed.