23302 – [4.1 Regression] extra move generated on x86

Bug 23302 - [4.1 Regression] extra move generated on x86

Summary: [4.1 Regression] extra move generated on x86

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.1.0

Importance:	P4 minor
Target Milestone:	4.1.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	23153 23378
	Show dependency tree / graph

Reported:	2005-08-09 22:04 UTC by Dan Nicolaescu
Modified:	2005-10-31 18:15 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:	i686--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2005-08-10 00:05:55

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dan Nicolaescu 2005-08-09 22:04:55 UTC

Compiling:

typedef char Boolean;
typedef unsigned char Char;
typedef Char *ScrnPtr;
typedef ScrnPtr *ScrnBuf;

typedef struct _WidgetRec *Widget;


typedef struct {
  int foo [1000];
  int max_col;
  int max_row;
  Widget scrollWidget;
  int savelines;
  ScrnBuf visbuf;
  ScrnBuf allbuf;
  Char *sbuf_address;
} TScreen;

typedef struct _XtermWidgetRec {
  TScreen screen;
  int num_ptrs;
}  *XtermWidget;


extern ScrnBuf Allocate (int nrow, int ncol, Char **addr);

extern XtermWidget term;

void
VTallocbuf(void)
{
  TScreen *screen = &term->screen;
  int nrows = screen->max_row + 1;

  if (screen->scrollWidget)
    nrows += screen->savelines;
  screen->allbuf = Allocate(nrows, screen->max_col + 1,
			    &screen->sbuf_address);
  if (screen->scrollWidget)
    screen->visbuf = &screen->allbuf[term->num_ptrs * screen->savelines];
  else
    screen->visbuf = screen->allbuf;
  return;
}

with 4.0 and 4.1 -march=i686 -O2 generates: (the significant part of the sdiff 
is shown) 

        movl    term, %edx                                    |         movl   
term, %eax
        movl    4012(%ebx), %eax                              |         movl   
4012(%ebx), %ecx
        imull   4028(%edx), %eax                              |         movl   
4028(%eax), %eax
        leal    (%ecx,%eax,4), %eax                           |         imull  
%ecx, %eax
                                                              >         leal   
(%edx,%eax,4), %eax


note that 4.1 generates an extra movl instruction. 
This is one of the reasons for 4.1 the code size regression in PR23153.

Comment 1 Andrew Pinski 2005-08-10 00:05:53 UTC

Confirmed, the rtl dumps at .combinea are almost the same except for:
-(insn 45 44 46 3 (set (reg:SI 70 [ <variable>.savelines ])
-        (mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ])
-                (const_int 4012 [0xfac])) [5 <variable>.savelines+0 S4 A32])) 35 {*movsi_1} (nil)
-    (nil))
+(insn 39 38 40 3 (set (reg:SI 65 [ <variable>.num_ptrs ])
+        (mem/s:SI (plus:SI (reg/f:SI 63 [ term ])
+                (const_int 4028 [0xfbc])) [4 <variable>.num_ptrs+0 S4 A32])) 34 {*movsi_1} (insn_list:
REG_DEP_TRUE 38 (nil))
+    (expr_list:REG_DEAD (reg/f:SI 63 [ term ])
+        (nil)))
 
-(insn 46 45 47 3 (parallel [
-            (set (reg:SI 71)
-                (mult:SI (mem/s:SI (plus:SI (reg/f:SI 68 [ term ])
-                            (const_int 4028 [0xfbc])) [5 <variable>.num_ptrs+0 S4 A32])
-                    (reg:SI 70 [ <variable>.savelines ])))
+(insn 40 39 41 3 (parallel [
+            (set (reg:SI 64)
+                (mult:SI (reg:SI 65 [ <variable>.num_ptrs ])
+                    (mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ])
+                            (const_int 4012 [0xfac])) [4 <variable>.savelines+0 S4 A32])))
             (clobber (reg:CC 17 flags))
-        ]) 173 {*mulsi3_1} (insn_list:REG_DEP_TRUE 43 (insn_list:REG_DEP_TRUE 45 (nil)))
-    (expr_list:REG_DEAD (reg/f:SI 68 [ term ])
+        ]) 182 {*mulsi3_1} (insn_list:REG_DEP_TRUE 39 (nil))
+    (expr_list:REG_DEAD (reg:SI 65 [ <variable>.num_ptrs ])
         (expr_list:REG_UNUSED (reg:CC 17 flags)
-            (expr_list:REG_DEAD (reg:SI 70 [ <variable>.savelines ])
-                (nil)))))
+            (nil))))


Which is because we expand the multiply (from tree to rtl) as:
(insn 38 36 39 (set (reg/f:SI 63)
        (mem/f/i:SI (symbol_ref:SI ("term") [flags 0x40] <var_decl 0xb7cf60b0 term>) [9 term+0 S4 A32])) 
-1 (nil)
    (nil))

(insn 39 38 40 (set (reg:SI 65)
        (mem/s:SI (plus:SI (reg/f:SI 63)
                (const_int 4028 [0xfbc])) [4 <variable>.num_ptrs+0 S4 A32])) -1 (nil)
    (nil))

(insn 40 39 41 (parallel [
            (set (reg:SI 64)
                (mult:SI (reg:SI 65)
                    (mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ])
                            (const_int 4012 [0xfac])) [4 <variable>.savelines+0 S4 A32])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

before we actually expanded the load of the screen->savelines seperately:
(insn 43 41 44 (set (reg/f:SI 68)
        (mem/f/i:SI (symbol_ref:SI ("term") [flags 0x40] <var_decl 0xb7cc8a8c term>) [9 term+0 S4 A32])) 
-1 (nil)
    (nil))
                
(insn 44 43 45 (set (reg:SI 69)
        (mem/s:SI (plus:SI (reg/f:SI 68)
                (const_int 4028 [0xfbc])) [5 <variable>.num_ptrs+0 S4 A32])) -1 (nil)
    (nil))
            
(insn 45 44 46 (set (reg:SI 70)
        (mem/s:SI (plus:SI (reg/v/f:SI 59 [ screen ])
                (const_int 4012 [0xfac])) [5 <variable>.savelines+0 S4 A32])) -1 (nil)
    (nil))
                
(insn 46 45 47 (parallel [
            (set (reg:SI 71)
                (mult:SI (reg:SI 69)
                    (reg:SI 70)))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil)) 


This is either a target problem for the expansion or a middle-end problem while expand, I am going to 
say a target issue.

I think this was caused by:
2005-07-30  Jan Hubicka  <jh@suse.cz>

        * expr.c (expand_expr_real_1): Do not load mem targets into register.
        * i386.c (ix86_fixup_binary_operands): Likewise.
        (ix86_expand_unary_operator): Likewise.
        (ix86_expand_fp_absneg_operator): Likewise.
        * optabs.c (expand_vec_cond_expr): Validate dest.

Comment 2 Jan Hubicka 2005-09-28 16:51:54 UTC

The actual problem here is that from combine's point of view the two
alternatives (lea preceeded by loads, or add with memory operand followed by
shift) looks equivalent and previously the shorter sequence was purely choosed
by luck because combine followed the right edge first. It is not quite possible
to solve this by combine's splitting mechanizm as the number of instruction
don't change before the read-modify instructions are broken up.

While it might be probably possible to design peephole or combiner insn patter
I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was
optimizing this by pure luck and the patch seems to have overall positive effect
on code size...

Honza

Comment 3 Jan Hubicka 2005-09-28 16:52:17 UTC

The actual problem here is that from combine's point of view the two
alternatives (lea preceeded by loads, or add with memory operand followed by
shift) looks equivalent and previously the shorter sequence was purely choosed
by luck because combine followed the right edge first. It is not quite possible
to solve this by combine's splitting mechanizm as the number of instruction
don't change before the read-modify instructions are broken up.

While it might be probably possible to design peephole or combiner insn patter
I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was
optimizing this by pure luck and the patch seems to have overall positive effect
on code size...

Honza

Comment 4 Dan Nicolaescu 2005-09-28 17:29:58 UTC

(In reply to comment #2)
> While it might be probably possible to design peephole or combiner insn patter
> I am tempted to close this and PR 23303 as WONTFIX as it seems to me we was
> optimizing this by pure luck and the patch seems to have overall positive effect
> on code size...

IMHO closing these bugs as WONTFIX is not the right thing to do. This is clearly
a missed optimization opportunity. The fact that it worked by chance before your
(overall good) patch does not make fixing this issue less desirable.

Comment 5 Mark Mitchell 2005-10-31 04:43:17 UTC

I think we should look for some solution to this problem, without reverting the previous patch.  If this problem is amenable to a peephole, let's solve it that way.

That said, I'm going to downgrade this to P4; if we can't fix it for 4.1, we'll look again for 4.2.

Comment 6 Jan Hubicka 2005-10-31 18:15:23 UTC

Actually the cited 4.0 sequence do not obey the const int x86_read_modify = ~(m_PENT | m_PPRO);
setting -march=athlon or using -Os makes the move go away.  Additionally I get following sequence out of 4.0 in SUSE distro:
        movl    term, %eax
        movl    4012(%ebx), %ecx
        movl    4028(%eax), %eax
        imull   %ecx, %eax
that also contains the extra memory move.
So I am going to close this as invalid.
Honza