Bug 23451 - Redundant reloading from stack frame on i386
Summary: Redundant reloading from stack frame on i386
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.0.2
: P2 enhancement
Target Milestone: 4.3.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks:
 
Reported: 2005-08-18 06:39 UTC by Debian GCC Maintainers
Modified: 2009-04-22 23:06 UTC (History)
2 users (show)

See Also:
Host:
Target: i386-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-01-10 23:44:58


Attachments
gcc 3.2 vs. gcc 3.3 .s output, march=i486 (928 bytes, text/plain)
2006-01-10 23:00 UTC, Steven Bosscher
Details
gcc 3.2 vs. gcc 3.3 .s output, march=i586 (959 bytes, text/plain)
2006-01-10 23:00 UTC, Steven Bosscher
Details
gcc 3.2 vs. gcc 3.3 .s output, march=i686 (917 bytes, text/plain)
2006-01-10 23:01 UTC, Steven Bosscher
Details
gcc 3.2 vs. gcc 4.0 .s output, march=i686 (927 bytes, text/plain)
2006-01-10 23:04 UTC, Steven Bosscher
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Debian GCC Maintainers 2005-08-18 06:39:40 UTC
[forwarded from http://bugs.debian.org/301746]

  Matthias

Rechecked with gcc-4.0, bug submitter writes:

This is a regression from gcc-3.2 in that gcc-3.2 does not have this
problem bug gcc-3.3 does.

The attached program generates the following code on i386 with
gcc -S -O2:

...

.L2:
	movl	-16(%ebp), %eax
	xorl	%esi, %esi
	testl	%eax, %eax
	je	.L8

As you can see %eax now contains the value in -16(%ebp).

	subl	$12, %esp
	movl	-16(%ebp), %eax

gcc-3.4 decides to reload for no reason at all.

	pushl	%eax
	call	strlen
	movl	%eax, %esi
	addl	$16, %esp
.L5:
	pushl	%eax
	pushl	%edi
	pushl	%ebx
	leal	2(%edi,%esi), %eax
	pushl	%eax
	call	malloc
	movl	%eax, (%esp)
	movl	%eax, %ebx
	call	mempcpy
	movl	-16(%ebp), %edi

Again the value of -16(%ebp) is in %edi.

	addl	$16, %esp
	testl	%edi, %edi
	je	.L6
	movb	$61, (%eax)
	pushl	%ecx
	pushl	%esi
	movl	-16(%ebp), %edx

gcc-3.4 decides to reload it yet again.

	pushl	%edx
	incl	%eax
	pushl	%eax
	call	mempcpy
	addl	$16, %esp


-----------------------------------------------

typedef unsigned int size_t;
extern size_t strlen (__const char *__s) __attribute__ ((__pure__));
extern char *strchrnul (__const char *__s, int __c) __attribute__ ((__pure__));
extern void *mempcpy (void *__restrict __dest,
        __const void *__restrict __src, size_t __n) ;
extern void *malloc(size_t);

extern void setvareq(char *s, int flags);
extern char *endofname(const char *);
extern void sh_error(const char *, ...) __attribute__ ((__noreturn__));

void
setvar(const char *name, const char *val, int flags)
{
 char *p, *q;
 size_t namelen;
 char *nameeq;
 size_t vallen;

 q = endofname(name);
 p = strchrnul(q, '=');
 namelen = p - name;
 if (!namelen || p != q)
  sh_error("%.*s: bad variable name", namelen, name);
 vallen = 0;
 if (val == ((void *)0)) {
  flags |= 2;
 } else {
  vallen = strlen(val);
 }
 p = mempcpy(nameeq = malloc(namelen + vallen + 2), name, namelen);
 if (val) {
  *p++ = '=';
  p = mempcpy(p, val, vallen);
 }
 *p = '\0';
 setvareq(nameeq, flags | 1);
}
Comment 1 Andrew Pinski 2005-08-18 12:01:16 UTC
I think this is just a ra issue which have no hope of being fixed in a release branch.
Comment 2 Mark Mitchell 2005-10-31 05:05:49 UTC
We don't need speculation; we need facts.  I'll leave this at P2, in the hopes that someone will analyze this properly.
Comment 3 Steven Bosscher 2006-01-10 20:36:18 UTC
GCC 4.2 (trunk) produces this kind of redundant loads:
        ...
        movl    -20(%ebp), %eax
        testl   %eax, %eax
        je      .L10
        movl    -20(%ebp), %eax
        movl    %eax, (%esp)
        call    strlen
        ...

And so does GCC 3.3-hammer_branch:
        ...
        movl    -16(%ebp), %ecx
        xorl    %ebx, %ebx
        testl   %ecx, %ecx
        je      .L8
        movl    -16(%ebp), %eax
        movl    %eax, (%esp)
        call    strlen
        ...
Comment 4 Steven Bosscher 2006-01-10 20:57:38 UTC
On the trunk, we have the following situation in the .csa RTL dump (on AMD64 -m32 -march=i686):

;; Start of basic block 5, registers live:
 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame]
(code_label:HI 38 37 39 5 2 "" [1 uses])

(note:HI 39 38 41 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn:HI 41 39 42 5 (set (reg:CCZ 17 flags)
        (compare:CCZ (mem/f/c:SI (plus:SI (reg/f:SI 6 bp)
                    (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8])
            (const_int 0 [0x0]))) 3 {*cmpsi_ccno_1} (nil)
    (nil))

(jump_insn:HI 42 41 44 5 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref 63)
            (pc))) 511 {*jcc_1} (insn_list:REG_DEP_TRUE 41 (nil))
    (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (expr_list:REG_BR_PROB (const_int 8100 [0x1fa4])
            (nil))))
;; End of basic block 5, registers live:
 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame]

(...)

;; Start of basic block 7, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame]
(code_label:HI 63 119 64 7 5 "" [1 uses])

(note:HI 64 63 125 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 125 64 66 7 (set (reg:SI 0 ax)
        (mem/f/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8])) 40 {*movsi_1} (nil)
    (nil))

(insn:HI 66 125 67 7 (set (mem:SI (reg/f:SI 7 sp) [0 S4 A32])
        (reg:SI 0 ax)) 40 {*movsi_1} (nil)
    (expr_list:REG_DEAD (reg:SI 0 ax)
        (nil)))

(call_insn/u:HI 67 66 68 7 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:SI ("strlen") [flags 0x41]
                       <function_decl 0x2aaaaae6fb00 strlen>) [0 S1 A8])
            (const_int 4 [0x4]))) 731 {*call_value_0} (nil)
    (expr_list:REG_EH_REGION (const_int 0 [0x0])
        (nil))
    (expr_list:REG_DEP_TRUE (use (mem:BLK (scratch) [0 A8]))
        (nil)))
(...)


Then we peephole2 the MEM for the compare to a set, see the .peephole2 dump:

;; Start of basic block 5, registers live:
  4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame]
(code_label:HI 38 37 39 5 2 "" [1 uses])

(note:HI 39 38 142 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn 142 39 143 5 (set (reg:SI 0 ax)
        (mem/f/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8])) -1 (nil)
    (nil))

(insn 143 142 42 5 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 0 ax)
            (const_int 0 [0x0]))) -1 (nil)
    (expr_list:REG_DEAD (reg:SI 0 ax)
        (nil)))

(jump_insn:HI 42 143 44 5 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref 63)
            (pc))) 511 {*jcc_1} (insn_list:REG_DEP_TRUE 41 (nil))
    (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (expr_list:REG_BR_PROB (const_int 8100 [0x1fa4])
            (nil))))
;; End of basic block 5, registers live:
 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame]

(...)

;; Start of basic block 7, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame]
(code_label:HI 63 119 64 7 5 "" [1 uses])

(note:HI 64 63 125 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 125 64 66 7 (set (reg:SI 0 ax)
        (mem/f/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8])) 40 {*movsi_1} (nil)
    (nil))

(insn:HI 66 125 67 7 (set (mem:SI (reg/f:SI 7 sp) [0 S4 A32])
        (reg:SI 0 ax)) 40 {*movsi_1} (nil)
    (expr_list:REG_DEAD (reg:SI 0 ax)
        (nil)))

(call_insn/u:HI 67 66 68 7 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:SI ("strlen") [flags 0x41]
                       <function_decl 0x2aaaaae6fb00 strlen>) [0 S1 A8])
            (const_int 4 [0x4]))) 731 {*call_value_0} (nil)
    (expr_list:REG_EH_REGION (const_int 0 [0x0])
        (nil))
    (expr_list:REG_DEP_TRUE (use (mem:BLK (scratch) [0 A8]))
        (nil)))


Essentially the same thing happens in GCC 3.3: Up to peephole2, the first MEM is burried in the compare, and split out by some peephole. 

I don't think this is related to register allocation, really.  Someone should try to see what the flow2 dump looks like for GCC 3.2, which is apparently the last release that did not have this problem.  In any case, it will be difficult to find out when this problem was introduced.  But it doesn't look like a really severe problem to me anyway, really...
Comment 5 Steven Bosscher 2006-01-10 21:27:50 UTC
FWIW, the peephole that we trigger is this one, which has been around since forever (since rth's ia32 backend rewrite from the previous century...):

;; Don't compare memory with zero, load and use a test instead.
(define_peephole2
  [(set (match_operand 0 "flags_reg_operand" "")
        (match_operator 1 "compare_operator"
          [(match_operand:SI 2 "memory_operand" "")
           (const_int 0)]))
   (match_scratch:SI 3 "r")]
  "ix86_match_ccmode (insn, CCNOmode) && ! optimize_size"
  [(set (match_dup 3) (match_dup 2))
   (set (match_dup 0) (match_op_dup 1 [(match_dup 3) (const_int 0)]))]
  "")


But it feels like someone made a bad joke, or something.  These are some pieces of what GCC 3.2 makes of it (with -march=i[456]86):

        movl    -16(%ebp), %eax
        xorl    %edi, %edi
        testl   %eax, %eax
        je      .L7
        subl    $12, %esp
        movl    -16(%ebp), %eax
        pushl   %eax
        call    strlen

and

        movl    -16(%ebp), %esi
        addl    $16, %esp
        testl   %esi, %esi
        je      .L6
        movb    $61, (%eax)
        pushl   %ecx
        pushl   %edi
        movl    -16(%ebp), %edx

This is on AMD64 -m32 with "GNU C version 3.2.3 (x86_64-unknown-linux-gnu)",        compiled by gcc 3.3-hammer-branch, gcc 3.4 and later can't compile gcc 3.2.
Comment 6 Steven Bosscher 2006-01-10 21:32:57 UTC
Since GCC 3.2 also has this problem, contrary to what the reporter claims, I am not sure if we should keep this marked as a regression.  Obviously it is a missed optimization, so the bug report is valid in that sense, and we should keep it open at least.

Could the reporter check whether GCC 3.2 really does not have this problem, or if perhaps this is a regresion from even older compilers?  This problem may have been introduced when the new (*cough* 7 years old) ix86 backend was contributed.  In that case, this may still be a regression from GCC 2.95.  But I am not willing to go that far back in history ;-)
Comment 7 herbert 2006-01-10 22:44:31 UTC
Subject: Re:  [3.4/4.0/4.1/4.2 regression] Redundant reloading from stack frame

On Tue, Jan 10, 2006 at 09:32:58PM -0000, steven at gcc dot gnu dot org wrote:
> 
> ------- Comment #6 from steven at gcc dot gnu dot org  2006-01-10 21:32 -------
> Since GCC 3.2 also has this problem, contrary to what the reporter claims, I am
> not sure if we should keep this marked as a regression.  Obviously it is a
> missed optimization, so the bug report is valid in that sense, and we should
> keep it open at least.

gcc-3.2 keeps the value in %esi and therefore avoids the problem:

        .file   "a.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%.*s: bad variable name"
        .text
        .p2align 2,,3
.globl setvar
        .type   setvar,@function
setvar:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        subl    $24, %esp
        movl    8(%ebp), %ebx
        movl    16(%ebp), %eax
        movl    12(%ebp), %esi

Only one load.

        movl    %eax, -16(%ebp)
        pushl   %ebx
        call    endofname
        subl    $8, %esp
        pushl   $61
        pushl   %eax
        movl    %eax, %edi
        call    strchrnul
        movl    %eax, %edx
        addl    $32, %esp
        subl    %ebx, %edx
        je      .L3
        cmpl    %edi, %eax
        jne     .L3
        xorl    %edi, %edi
        testl   %esi, %esi
        je      .L7
        movl    %esi, %edi
        cld
        movl    $-1, %ecx
        xorl    %eax, %eax
        repnz
        scasb
        notl    %ecx
        leal    -1(%ecx), %edi
.L5:
        pushl   %ecx
        pushl   %edx
        pushl   %ebx
        leal    2(%edi,%edx), %eax
        pushl   %eax
        call    malloc
        movl    %eax, (%esp)
        movl    %eax, %ebx
        call    mempcpy
        addl    $16, %esp
        testl   %esi, %esi
        je      .L6
        movb    $61, (%eax)
        pushl   %edx
        pushl   %edi
        pushl   %esi
        incl    %eax
        pushl   %eax
        call    mempcpy
        addl    $16, %esp
.L6:
        movb    $0, (%eax)
        orl     $1, -16(%ebp)
        movl    -16(%ebp), %eax
        movl    %eax, 12(%ebp)
        movl    %ebx, 8(%ebp)
        leal    -12(%ebp), %esp
        popl    %ebx
        popl    %esi
        popl    %edi
        leave
        jmp     setvareq
        .p2align 2,,3
.L7:
        orl     $2, -16(%ebp)
        jmp     .L5
.L3:
        pushl   %esi
        pushl   %ebx
        pushl   %edx
        pushl   $.LC0
        call    sh_error
.Lfe1:
        .size   setvar,.Lfe1-setvar
        .ident  "GCC: (GNU) 3.2.3 (Debian)"

Thanks,
Comment 8 Steven Bosscher 2006-01-10 22:58:03 UTC
Unfortunately you're not showing your full command line, so I can only guess what platform your host is and for what target you are compiling.  I will attach diffs between GCC 3.2 and GCC 3.3-hammer for i[456]86.  To me the changes look insignificant, but maybe you can point out what you're unhappy with.

Comment 9 Steven Bosscher 2006-01-10 23:00:13 UTC
Created attachment 10612 [details]
gcc 3.2 vs. gcc 3.3 .s output, march=i486

All .s files created on AMD64, compiler options -m32 -S -O2 -march=i[456]86.
Comment 10 Steven Bosscher 2006-01-10 23:00:43 UTC
Created attachment 10613 [details]
gcc 3.2 vs. gcc 3.3 .s output, march=i586
Comment 11 Steven Bosscher 2006-01-10 23:01:05 UTC
Created attachment 10614 [details]
gcc 3.2 vs. gcc 3.3 .s output, march=i686
Comment 12 Steven Bosscher 2006-01-10 23:02:08 UTC
For the record, it is a known problem that x86 32 bits hosts and x86_64 hosts sometimes produce different code, even with the same -march options.  We may be seeing one such case here, eventhough that is quite unlikely.
Comment 13 Steven Bosscher 2006-01-10 23:04:49 UTC
Created attachment 10615 [details]
gcc 3.2 vs. gcc 4.0 .s output, march=i686

For the sake of completeness, also a diff between GCC 3.2 and GCC 4.0 for i686, still at -O2.
Comment 14 herbert 2006-01-10 23:25:39 UTC
Subject: Re:  [3.4/4.0/4.1/4.2 regression] Redundant reloading from stack frame

On Tue, Jan 10, 2006 at 10:58:03PM -0000, steven at gcc dot gnu dot org wrote:
> 
> 
> ------- Comment #8 from steven at gcc dot gnu dot org  2006-01-10 22:58 -------
> Unfortunately you're not showing your full command line, so I can only guess
> what platform your host is and for what target you are compiling.  I will
> attach diffs between GCC 3.2 and GCC 3.3-hammer for i[456]86.  To me the
> changes look insignificant, but maybe you can point out what you're unhappy
> with.

Good point.  I was using

gcc-3.2 -S -O2

which for my build defaults to --march=i386.  Indeed, using --march=i486
produces the same output as gcc-3.3/gcc-3.4.  In fact, using --march=i386
on gcc-3.3/gcc-3.4 also removes the duplicate reloading.

So it seems that it's really a difference between --march=i386 versus
--march=i486.

Cheers,
Comment 15 Steven Bosscher 2006-01-10 23:36:50 UTC
Then I am quite sure that the difference comes from using "repnz; scasb" in GCC 3.2 vs. calling strlen in GCC 3.3 on i486.  For GCC 3.2, the code for i386 and i486 are pretty much equivalent (the only difference is that on i386 we use leave), but with GCC 3.3 the code for i486 is very different from the code for i386.  Looks like some tuning parameters got changed.

So, closing as INVALID, i.e. not-a-bug. It's an interesting oddity though.  I've learned something again today :-)


Comment 16 herbert 2006-01-10 23:41:06 UTC
Subject: Re:  [3.4/4.0/4.1/4.2 regression] Redundant reloading from stack frame

On Tue, Jan 10, 2006 at 11:36:51PM -0000, steven at gcc dot gnu dot org wrote:
> 
> So, closing as INVALID, i.e. not-a-bug. It's an interesting oddity though. 
> I've learned something again today :-)

I still think this is an optimisation issue though since it should know
not to reload something that's already in a register.

Cheers,
Comment 17 Steven Bosscher 2006-01-10 23:44:21 UTC
Fair enough.  I think it's highly unlikely that anyone would care enough about i386 to worry about fixing this, but you never know.
Comment 18 herbert 2006-01-10 23:45:35 UTC
Subject: Re:  Redundant reloading from stack frame on i386

On Tue, Jan 10, 2006 at 11:44:21PM -0000, steven at gcc dot gnu dot org wrote:
> 
> 
> ------- Comment #17 from steven at gcc dot gnu dot org  2006-01-10 23:44 -------
> Fair enough.  I think it's highly unlikely that anyone would care enough about
> i386 to worry about fixing this, but you never know.

Actually the bug occurs when -march is i[456]86.  It goes away
with -mi386 :)
 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|RESOLVED                    |REOPENED
>  GCC target triplet|i486-*-*                    |i386-*-*

So I think this needs to be reversed.

Thanks,
Comment 19 Andrew Pinski 2009-04-22 23:06:17 UTC
Fixed in 4.3.0 by the new register allocator (IRA).