[forwarded from http://bugs.debian.org/301746] Matthias Rechecked with gcc-4.0, bug submitter writes: This is a regression from gcc-3.2 in that gcc-3.2 does not have this problem bug gcc-3.3 does. The attached program generates the following code on i386 with gcc -S -O2: ... .L2: movl -16(%ebp), %eax xorl %esi, %esi testl %eax, %eax je .L8 As you can see %eax now contains the value in -16(%ebp). subl $12, %esp movl -16(%ebp), %eax gcc-3.4 decides to reload for no reason at all. pushl %eax call strlen movl %eax, %esi addl $16, %esp .L5: pushl %eax pushl %edi pushl %ebx leal 2(%edi,%esi), %eax pushl %eax call malloc movl %eax, (%esp) movl %eax, %ebx call mempcpy movl -16(%ebp), %edi Again the value of -16(%ebp) is in %edi. addl $16, %esp testl %edi, %edi je .L6 movb $61, (%eax) pushl %ecx pushl %esi movl -16(%ebp), %edx gcc-3.4 decides to reload it yet again. pushl %edx incl %eax pushl %eax call mempcpy addl $16, %esp ----------------------------------------------- typedef unsigned int size_t; extern size_t strlen (__const char *__s) __attribute__ ((__pure__)); extern char *strchrnul (__const char *__s, int __c) __attribute__ ((__pure__)); extern void *mempcpy (void *__restrict __dest, __const void *__restrict __src, size_t __n) ; extern void *malloc(size_t); extern void setvareq(char *s, int flags); extern char *endofname(const char *); extern void sh_error(const char *, ...) __attribute__ ((__noreturn__)); void setvar(const char *name, const char *val, int flags) { char *p, *q; size_t namelen; char *nameeq; size_t vallen; q = endofname(name); p = strchrnul(q, '='); namelen = p - name; if (!namelen || p != q) sh_error("%.*s: bad variable name", namelen, name); vallen = 0; if (val == ((void *)0)) { flags |= 2; } else { vallen = strlen(val); } p = mempcpy(nameeq = malloc(namelen + vallen + 2), name, namelen); if (val) { *p++ = '='; p = mempcpy(p, val, vallen); } *p = '\0'; setvareq(nameeq, flags | 1); }
I think this is just a ra issue which have no hope of being fixed in a release branch.
We don't need speculation; we need facts. I'll leave this at P2, in the hopes that someone will analyze this properly.
GCC 4.2 (trunk) produces this kind of redundant loads: ... movl -20(%ebp), %eax testl %eax, %eax je .L10 movl -20(%ebp), %eax movl %eax, (%esp) call strlen ... And so does GCC 3.3-hammer_branch: ... movl -16(%ebp), %ecx xorl %ebx, %ebx testl %ecx, %ecx je .L8 movl -16(%ebp), %eax movl %eax, (%esp) call strlen ...
On the trunk, we have the following situation in the .csa RTL dump (on AMD64 -m32 -march=i686): ;; Start of basic block 5, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] (code_label:HI 38 37 39 5 2 "" [1 uses]) (note:HI 39 38 41 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn:HI 41 39 42 5 (set (reg:CCZ 17 flags) (compare:CCZ (mem/f/c:SI (plus:SI (reg/f:SI 6 bp) (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8]) (const_int 0 [0x0]))) 3 {*cmpsi_ccno_1} (nil) (nil)) (jump_insn:HI 42 41 44 5 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 63) (pc))) 511 {*jcc_1} (insn_list:REG_DEP_TRUE 41 (nil)) (expr_list:REG_DEAD (reg:CCZ 17 flags) (expr_list:REG_BR_PROB (const_int 8100 [0x1fa4]) (nil)))) ;; End of basic block 5, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] (...) ;; Start of basic block 7, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] (code_label:HI 63 119 64 7 5 "" [1 uses]) (note:HI 64 63 125 7 [bb 7] NOTE_INSN_BASIC_BLOCK) (insn 125 64 66 7 (set (reg:SI 0 ax) (mem/f/c:SI (plus:SI (reg/f:SI 6 bp) (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8])) 40 {*movsi_1} (nil) (nil)) (insn:HI 66 125 67 7 (set (mem:SI (reg/f:SI 7 sp) [0 S4 A32]) (reg:SI 0 ax)) 40 {*movsi_1} (nil) (expr_list:REG_DEAD (reg:SI 0 ax) (nil))) (call_insn/u:HI 67 66 68 7 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:SI ("strlen") [flags 0x41] <function_decl 0x2aaaaae6fb00 strlen>) [0 S1 A8]) (const_int 4 [0x4]))) 731 {*call_value_0} (nil) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil)) (expr_list:REG_DEP_TRUE (use (mem:BLK (scratch) [0 A8])) (nil))) (...) Then we peephole2 the MEM for the compare to a set, see the .peephole2 dump: ;; Start of basic block 5, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] (code_label:HI 38 37 39 5 2 "" [1 uses]) (note:HI 39 38 142 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 142 39 143 5 (set (reg:SI 0 ax) (mem/f/c:SI (plus:SI (reg/f:SI 6 bp) (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8])) -1 (nil) (nil)) (insn 143 142 42 5 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 0 ax) (const_int 0 [0x0]))) -1 (nil) (expr_list:REG_DEAD (reg:SI 0 ax) (nil))) (jump_insn:HI 42 143 44 5 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 63) (pc))) 511 {*jcc_1} (insn_list:REG_DEP_TRUE 41 (nil)) (expr_list:REG_DEAD (reg:CCZ 17 flags) (expr_list:REG_BR_PROB (const_int 8100 [0x1fa4]) (nil)))) ;; End of basic block 5, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] (...) ;; Start of basic block 7, registers live: 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] (code_label:HI 63 119 64 7 5 "" [1 uses]) (note:HI 64 63 125 7 [bb 7] NOTE_INSN_BASIC_BLOCK) (insn 125 64 66 7 (set (reg:SI 0 ax) (mem/f/c:SI (plus:SI (reg/f:SI 6 bp) (const_int -20 [0xffffffffffffffec])) [7 val+0 S4 A8])) 40 {*movsi_1} (nil) (nil)) (insn:HI 66 125 67 7 (set (mem:SI (reg/f:SI 7 sp) [0 S4 A32]) (reg:SI 0 ax)) 40 {*movsi_1} (nil) (expr_list:REG_DEAD (reg:SI 0 ax) (nil))) (call_insn/u:HI 67 66 68 7 (set (reg:SI 0 ax) (call (mem:QI (symbol_ref:SI ("strlen") [flags 0x41] <function_decl 0x2aaaaae6fb00 strlen>) [0 S1 A8]) (const_int 4 [0x4]))) 731 {*call_value_0} (nil) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil)) (expr_list:REG_DEP_TRUE (use (mem:BLK (scratch) [0 A8])) (nil))) Essentially the same thing happens in GCC 3.3: Up to peephole2, the first MEM is burried in the compare, and split out by some peephole. I don't think this is related to register allocation, really. Someone should try to see what the flow2 dump looks like for GCC 3.2, which is apparently the last release that did not have this problem. In any case, it will be difficult to find out when this problem was introduced. But it doesn't look like a really severe problem to me anyway, really...
FWIW, the peephole that we trigger is this one, which has been around since forever (since rth's ia32 backend rewrite from the previous century...): ;; Don't compare memory with zero, load and use a test instead. (define_peephole2 [(set (match_operand 0 "flags_reg_operand" "") (match_operator 1 "compare_operator" [(match_operand:SI 2 "memory_operand" "") (const_int 0)])) (match_scratch:SI 3 "r")] "ix86_match_ccmode (insn, CCNOmode) && ! optimize_size" [(set (match_dup 3) (match_dup 2)) (set (match_dup 0) (match_op_dup 1 [(match_dup 3) (const_int 0)]))] "") But it feels like someone made a bad joke, or something. These are some pieces of what GCC 3.2 makes of it (with -march=i[456]86): movl -16(%ebp), %eax xorl %edi, %edi testl %eax, %eax je .L7 subl $12, %esp movl -16(%ebp), %eax pushl %eax call strlen and movl -16(%ebp), %esi addl $16, %esp testl %esi, %esi je .L6 movb $61, (%eax) pushl %ecx pushl %edi movl -16(%ebp), %edx This is on AMD64 -m32 with "GNU C version 3.2.3 (x86_64-unknown-linux-gnu)", compiled by gcc 3.3-hammer-branch, gcc 3.4 and later can't compile gcc 3.2.
Since GCC 3.2 also has this problem, contrary to what the reporter claims, I am not sure if we should keep this marked as a regression. Obviously it is a missed optimization, so the bug report is valid in that sense, and we should keep it open at least. Could the reporter check whether GCC 3.2 really does not have this problem, or if perhaps this is a regresion from even older compilers? This problem may have been introduced when the new (*cough* 7 years old) ix86 backend was contributed. In that case, this may still be a regression from GCC 2.95. But I am not willing to go that far back in history ;-)
Subject: Re: [3.4/4.0/4.1/4.2 regression] Redundant reloading from stack frame On Tue, Jan 10, 2006 at 09:32:58PM -0000, steven at gcc dot gnu dot org wrote: > > ------- Comment #6 from steven at gcc dot gnu dot org 2006-01-10 21:32 ------- > Since GCC 3.2 also has this problem, contrary to what the reporter claims, I am > not sure if we should keep this marked as a regression. Obviously it is a > missed optimization, so the bug report is valid in that sense, and we should > keep it open at least. gcc-3.2 keeps the value in %esi and therefore avoids the problem: .file "a.c" .section .rodata.str1.1,"aMS",@progbits,1 .LC0: .string "%.*s: bad variable name" .text .p2align 2,,3 .globl setvar .type setvar,@function setvar: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $24, %esp movl 8(%ebp), %ebx movl 16(%ebp), %eax movl 12(%ebp), %esi Only one load. movl %eax, -16(%ebp) pushl %ebx call endofname subl $8, %esp pushl $61 pushl %eax movl %eax, %edi call strchrnul movl %eax, %edx addl $32, %esp subl %ebx, %edx je .L3 cmpl %edi, %eax jne .L3 xorl %edi, %edi testl %esi, %esi je .L7 movl %esi, %edi cld movl $-1, %ecx xorl %eax, %eax repnz scasb notl %ecx leal -1(%ecx), %edi .L5: pushl %ecx pushl %edx pushl %ebx leal 2(%edi,%edx), %eax pushl %eax call malloc movl %eax, (%esp) movl %eax, %ebx call mempcpy addl $16, %esp testl %esi, %esi je .L6 movb $61, (%eax) pushl %edx pushl %edi pushl %esi incl %eax pushl %eax call mempcpy addl $16, %esp .L6: movb $0, (%eax) orl $1, -16(%ebp) movl -16(%ebp), %eax movl %eax, 12(%ebp) movl %ebx, 8(%ebp) leal -12(%ebp), %esp popl %ebx popl %esi popl %edi leave jmp setvareq .p2align 2,,3 .L7: orl $2, -16(%ebp) jmp .L5 .L3: pushl %esi pushl %ebx pushl %edx pushl $.LC0 call sh_error .Lfe1: .size setvar,.Lfe1-setvar .ident "GCC: (GNU) 3.2.3 (Debian)" Thanks,
Unfortunately you're not showing your full command line, so I can only guess what platform your host is and for what target you are compiling. I will attach diffs between GCC 3.2 and GCC 3.3-hammer for i[456]86. To me the changes look insignificant, but maybe you can point out what you're unhappy with.
Created attachment 10612 [details] gcc 3.2 vs. gcc 3.3 .s output, march=i486 All .s files created on AMD64, compiler options -m32 -S -O2 -march=i[456]86.
Created attachment 10613 [details] gcc 3.2 vs. gcc 3.3 .s output, march=i586
Created attachment 10614 [details] gcc 3.2 vs. gcc 3.3 .s output, march=i686
For the record, it is a known problem that x86 32 bits hosts and x86_64 hosts sometimes produce different code, even with the same -march options. We may be seeing one such case here, eventhough that is quite unlikely.
Created attachment 10615 [details] gcc 3.2 vs. gcc 4.0 .s output, march=i686 For the sake of completeness, also a diff between GCC 3.2 and GCC 4.0 for i686, still at -O2.
Subject: Re: [3.4/4.0/4.1/4.2 regression] Redundant reloading from stack frame On Tue, Jan 10, 2006 at 10:58:03PM -0000, steven at gcc dot gnu dot org wrote: > > > ------- Comment #8 from steven at gcc dot gnu dot org 2006-01-10 22:58 ------- > Unfortunately you're not showing your full command line, so I can only guess > what platform your host is and for what target you are compiling. I will > attach diffs between GCC 3.2 and GCC 3.3-hammer for i[456]86. To me the > changes look insignificant, but maybe you can point out what you're unhappy > with. Good point. I was using gcc-3.2 -S -O2 which for my build defaults to --march=i386. Indeed, using --march=i486 produces the same output as gcc-3.3/gcc-3.4. In fact, using --march=i386 on gcc-3.3/gcc-3.4 also removes the duplicate reloading. So it seems that it's really a difference between --march=i386 versus --march=i486. Cheers,
Then I am quite sure that the difference comes from using "repnz; scasb" in GCC 3.2 vs. calling strlen in GCC 3.3 on i486. For GCC 3.2, the code for i386 and i486 are pretty much equivalent (the only difference is that on i386 we use leave), but with GCC 3.3 the code for i486 is very different from the code for i386. Looks like some tuning parameters got changed. So, closing as INVALID, i.e. not-a-bug. It's an interesting oddity though. I've learned something again today :-)
Subject: Re: [3.4/4.0/4.1/4.2 regression] Redundant reloading from stack frame On Tue, Jan 10, 2006 at 11:36:51PM -0000, steven at gcc dot gnu dot org wrote: > > So, closing as INVALID, i.e. not-a-bug. It's an interesting oddity though. > I've learned something again today :-) I still think this is an optimisation issue though since it should know not to reload something that's already in a register. Cheers,
Fair enough. I think it's highly unlikely that anyone would care enough about i386 to worry about fixing this, but you never know.
Subject: Re: Redundant reloading from stack frame on i386 On Tue, Jan 10, 2006 at 11:44:21PM -0000, steven at gcc dot gnu dot org wrote: > > > ------- Comment #17 from steven at gcc dot gnu dot org 2006-01-10 23:44 ------- > Fair enough. I think it's highly unlikely that anyone would care enough about > i386 to worry about fixing this, but you never know. Actually the bug occurs when -march is i[456]86. It goes away with -mi386 :) > What |Removed |Added > ---------------------------------------------------------------------------- > Status|RESOLVED |REOPENED > GCC target triplet|i486-*-* |i386-*-* So I think this needs to be reversed. Thanks,
Fixed in 4.3.0 by the new register allocator (IRA).