Bug 23303 - [4.1 Regression] 4.1 generates sall + addl instead of leal
Summary: [4.1 Regression] 4.1 generates sall + addl instead of leal
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P2 minor
Target Milestone: 4.1.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization, patch
Depends on:
Blocks: 23153 23378
  Show dependency treegraph
 
Reported: 2005-08-09 22:05 UTC by Dan Nicolaescu
Modified: 2005-11-02 23:21 UTC (History)
2 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-08-10 00:12:43


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nicolaescu 2005-08-09 22:05:03 UTC
Compiling: 

typedef unsigned char Char;
typedef Char *ScrnPtr;
typedef ScrnPtr *ScrnBuf;


typedef struct {
  int foo [1000];
  int savelines;
  ScrnBuf visbuf;
  ScrnBuf allbuf;
} TScreen;

typedef struct _XtermWidgetRec {
  TScreen screen;
  int num_ptrs;
}  *XtermWidget;

extern XtermWidget term;

void
VTallocbuf(void)
{
  TScreen *screen = &term->screen;
  screen->visbuf = &screen->allbuf[term->num_ptrs * screen->savelines];
  return;
}

with 4.0 and 4.1 -march=i686 -O2 generates: (the significant part of the sdiff 
is shown) 

        movl    4012(%eax), %ecx                              |         movl   
4000(%edx), %ecx
        movl    4000(%eax), %edx                              |         movl   
4012(%edx), %eax
        imull   %ecx, %edx                                    |         imull  
%ecx, %eax
        movl    4008(%eax), %ecx                              |         movl   
4008(%edx), %ecx
        leal    (%ecx,%edx,4), %edx                           |         sall   
$2, %eax
        movl    %edx, 4004(%eax)                              |         addl   
%ecx, %eax
                                                              >         movl   
%eax, 4004(%edx)

Note that 4.1 generates sall + addl instead of a single leal instruction.
This might be one of the reasons for 4.1 the code size regression in PR23153.
Comment 1 Andrew Pinski 2005-08-10 00:12:42 UTC
This again looks like it was caused by:


I think this was caused by:
2005-07-30  Jan Hubicka  <jh@suse.cz>

        * expr.c (expand_expr_real_1): Do not load mem targets into register.
        * i386.c (ix86_fixup_binary_operands): Likewise.
        (ix86_expand_unary_operator): Likewise.
        (ix86_expand_fp_absneg_operator): Likewise.
        * optabs.c (expand_vec_cond_expr): Validate dest.
Comment 2 Steven Bosscher 2005-08-14 15:48:41 UTC
Smaller test case: 
===================================================== 
typedef struct { 
  char **visbuf; 
  char **allbuf; 
} TScreen; 
 
void 
VTallocbuf(TScreen *screen, unsigned long savelines) 
{ 
  screen->visbuf = &screen->allbuf[savelines]; 
} 
===================================================== 
 
On AMD64 this gives me: 
        salq    $3, %rsi 
        addq    8(%rdi), %rsi 
        movq    %rsi, (%rdi) 
        ret 
 
instead of 
        movq    8(%rdi), %rax 
        leaq    (%rax,%rsi,8), %rsi 
        movq    %rsi, (%rdi) 
        ret 
 
Comment 3 Paolo Bonzini 2005-08-22 11:01:51 UTC
I'm not confident that using salq in Steven's test case is really a
pessimization.  I'll consider a 32-bit target then, and this testcase

  char **
  VTallocbuf(char **allbuf, unsigned long savelines)
  {
    return &allbuf[savelines];
  }

For i686 we have

        movl    8(%esp), %eax
        movl    4(%esp), %edx
        sall    $2, %eax
        addl    %edx, %eax

instead of

        movl    8(%esp), %eax
        movl    4(%esp), %edx
        leal    (%edx,%eax,4), %eax

However even in this case a no-lea code could be feasible:

        movl    8(%esp), %eax
        sall    $2, %eax
        addl    4(%esp), %eax

And this is exactly the rtl we have until peephole2, where this peephole splits
the instruction:

;; Don't do logical operations with memory inputs.
(define_peephole2
  [(match_scratch:SI 2 "r")
   (parallel [(set (match_operand:SI 0 "register_operand" "")
                   (match_operator:SI 3 "arith_or_logical_operator"
                     [(match_dup 0)
                      (match_operand:SI 1 "memory_operand" "")]))
              (clobber (reg:CC FLAGS_REG))])]
  "! optimize_size && ! TARGET_READ_MODIFY"
  [(set (match_dup 2) (match_dup 1)) 
   (parallel [(set (match_dup 0)
                   (match_op_dup 3 [(match_dup 0) (match_dup 2)]))
              (clobber (reg:CC FLAGS_REG))])]
  "")

I think that Jan's patch should be conditionalized: if !optimize_size &&
!TARGET_READ_MODIFY, the transformation he removed will be done anyway, and too
late in the game.

Paolo
Comment 4 Mark Mitchell 2005-10-31 04:45:31 UTC
Jan, what's your analysis on this PR?
Comment 5 Jan Hubicka 2005-10-31 20:25:27 UTC
Subject: Re:  [4.1 Regression] 4.1 generates sall + addl instead of leal

> 
> 
> ------- Comment #4 from mmitchel at gcc dot gnu dot org  2005-10-31 04:45 -------
> Jan, what's your analysis on this PR?
Hmm,
I am still not sure if it matters too much, but since there are actually
dupes of this problem, I think we can simply add peep2 fixing this
particular common case.

I am testing attached patch.


/* { dg-do assemble { target i?86-*-* x86_64-*-* } } */
/* { dg-require-effective-target ilp32 } */
/* { dg-options "-O2 -march=i686" } */
/* { dg-final { scan-assembler "leal" } } */
typedef struct {
  char **visbuf;
  char **allbuf;
} TScreen;

void
VTallocbuf(TScreen *screen, unsigned long savelines)
{
  screen->visbuf = &screen->allbuf[savelines];
}
2005-10-31  Jan Hubicka  <jh@suse.cz>
	PR target/23303
	* i386.md: Add peep2 for simplyfing array accesses.
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 106275)
+++ config/i386/i386.md	(working copy)
@@ -19785,6 +19785,58 @@
   if (!rtx_equal_p (operands[0], operands[1]))
     emit_move_insn (operands[0], operands[1]);
 })
+
+;; After splitting up read-modify operations, array accesses with memory operands
+;; might end up in form:
+;;  sall    $2, %eax
+;;  movl    4(%esp), %edx
+;;  addl    %edx, %eax
+;; instead of pre-splitting:
+;;  sall    $2, %eax
+;;  addl    4(%esp), %eax
+;; Turn it into:
+;;  movl    4(%esp), %edx
+;;  leal    (%edx,%eax,4), %eax
+
+(define_peephole2
+  [(parallel [(set (match_operand 0 "register_operand" "")
+		   (ashift (match_operand 1 "register_operand" "")
+			   (match_operand 2 "const_int_operand" "")))
+	       (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand 3 "register_operand")
+        (match_operand 4 "x86_64_general_operand" ""))
+   (parallel [(set (match_operand 5 "register_operand" "")
+		   (plus (match_operand 6 "register_operand" "")
+			 (match_operand 7 "register_operand" "")))
+		   (clobber (reg:CC FLAGS_REG))])]
+"INTVAL (operands[2]) >= 0 && INTVAL (operands[2]) <= 3
+ /* Validate MODE for lea.  */
+ && ((!TARGET_PARTIAL_REG_STALL
+      && (GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode))
+     || GET_MODE (operands[0]) == SImode 
+     || (TARGET_64BIT && GET_MODE (operands[0]) == DImode))
+ /* We reorder load and the shift.  */
+ && !rtx_equal_p (operands[1], operands[3])
+ /* Last PLUS must consist of operand 0 and 3.  */
+ && !rtx_equal_p (operands[0], operands[3])
+ && (rtx_equal_p (operands[3], operands[6])
+     || rtx_equal_p (operands[3], operands[7]))
+ && (rtx_equal_p (operands[0], operands[6])
+     || rtx_equal_p (operands[0], operands[7]))
+ /* The intermediate operand 0 must die or be same as output.  */
+ && (rtx_equal_p (operands[0], operands[5])
+     || peep2_reg_dead_p (3, operands[0]))"
+  [(set (match_dup 3) (match_dup 4))
+   (set (match_dup 0) (match_dup 1))]
+{
+  int scale = 1 << INTVAL (operands[2]);
+  rtx index = gen_lowpart (Pmode, operands[1]);
+  rtx base = gen_lowpart (Pmode, operands[3]);
+  rtx dest = gen_lowpart (Pmode, operands[5]);
+
+  operands[1] = gen_rtx_PLUS (Pmode, base, gen_rtx_MULT (Pmode, index, GEN_INT (scale)));
+  operands[0] = dest;
+})
 
 ;; Call-value patterns last so that the wildcard operand does not
 ;; disrupt insn-recog's switch tables.
Comment 6 Andrew Pinski 2005-11-01 14:57:07 UTC
Patch posted:
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00011.html
Comment 7 Dan Nicolaescu 2005-11-01 15:15:45 UTC
(In reply to comment #5)
> Hmm,
> I am still not sure if it matters too much, but since there are actually
> dupes of this problem, I think we can simply add peep2 fixing this
> particular common case.
> 
> I am testing attached patch.

Could you please try to measure the code size impact of this patch?
(like the examples in PR23153: xterm, PR8361 or kernel)
Comment 8 Jan Hubicka 2005-11-02 23:21:27 UTC
Subject: Bug 23303

Author: hubicka
Date: Wed Nov  2 23:21:22 2005
New Revision: 106406

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106406
Log:
	PR target/23303
	* i386.md: Add peep2 for simplyfing array accesses.
	* gcc.dg/i386-lea.c: New test

Added:
    trunk/gcc/testsuite/gcc.dg/i386-lea.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/testsuite/ChangeLog

Comment 9 Jan Hubicka 2005-11-02 23:21:59 UTC
Fixed in mainline now.