This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/30213] New: Wrong code with optimized memset() (possible bug in RTL bbro optimizer)
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 14 Dec 2006 19:55:35 -0000
- Subject: [Bug rtl-optimization/30213] New: Wrong code with optimized memset() (possible bug in RTL bbro optimizer)
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
The code in attached testcase is taken from povray-3.6.1 and produces a nasty
regression, exposed by new optimized string functions. Please note, that
expanded RTL of pov_calloc() function is OK, but subsequent RTL optimization
(bbro) mixes BBs in the wrong order.
It is evident, that %ebx is cleared in BB4, and dies in BB5. This dump is from
_.148r.rnreg:
--cut here--
;; Start of basic block 4, registers live: 0 [ax] 1 [dx] 4 [si] 5 [di] 6 [bp] 7
[sp] 20 [frame]
;; Pred edge 3 [40.0%] (fallthru)
(note:HI 72 29 119 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 119 72 31 4 (parallel [
(set (reg:SI 3 bx [68])
(const_int 0 [0x0]))
(clobber (reg:CC 17 flags))
]) 38 {*movsi_xor} (nil)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(note:HI 31 119 89 4 NOTE_INSN_DELETED)
(insn 89 31 33 4 (set (reg:CCZ 17 flags)
(compare:CCZ (and:SI (reg:SI 0 ax [orig:59 block ] [59])
(const_int 1 [0x1]))
(const_int 0 [0x0]))) 286 {testsi_1} (nil)
(expr_list:REG_DEAD (reg:SI 0 ax [orig:59 block ] [59])
(nil)))
(jump_insn:HI 33 89 73 4 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0x0]))
(label_ref 36)
(pc))) 530 {*jcc_1} (insn_list:REG_DEP_TRUE 32 (nil))
(expr_list:REG_DEAD (reg:CCZ 17 flags)
(expr_list:REG_BR_PROB (const_int 9000 [0x2328])
(nil))))
;; End of basic block 4, registers live: 1 [dx] 3 [bx] 4 [si] 5 [di] 6 [bp] 7
[sp] 20 [frame]
;; Succ edge 6 [90.0%]
;; Succ edge 5 [10.0%] (fallthru)
;; Start of basic block 5, registers live: 1 [dx] 3 [bx] 4 [si] 5 [di] 6 [bp] 7
[sp] 20 [frame]
;; Pred edge 4 [10.0%] (fallthru)
(note:HI 73 33 95 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 95 73 34 5 (set (reg:QI 0 ax)
(reg:QI 3 bx)) 55 {*movqi_1} (nil)
(nil))
(insn:HI 34 95 35 5 (parallel [
(set (mem:QI (reg/f:SI 5 di [orig:67 block ] [67]) [0 S1 A8])
(reg:QI 0 ax))
(set (reg/f:SI 5 di [orig:67 block ] [67])
(plus:SI (reg/f:SI 5 di [orig:67 block ] [67])
(const_int 1 [0x1])))
]) 720 {*strsetqi_1} (nil)
(expr_list:REG_DEAD (reg:QI 0 ax)
(nil)))
--cut here--
However, _.149.bbro renames BB4 and BB5 into BB12 and BB17 respectively, where
BB12 can be reached _conditionally_ from BB3.
This produces wrong code for pov_calloc():
--cut here--
movl %eax, %esi #, block
testl %eax, %eax # block
je .L4 #, <<< check for NULL
movl %ebx, %edx # actsize, actsize
movl %eax, %edi # block, block
cmpl $3, %ebx #, actsize <<< memset check for "< 4"
ja .L13 #, <<< jump only for < 4
testb $2, %dl #, actsize
jne .L14 #, <<< here we go with wrong %ebx
.L9:
andb $1, %dl #, actsize
jne .L15 #, <<< here too.
.L4:
movl %esi, %eax # block, <result>
addl $16, %esp #,
popl %ebx #
popl %esi #
popl %edi #
popl %ebp #
ret
.L15:
movl %ebx, %eax #, <<< wrong %ebx moved to %eax
stosb <<< FUBAR 2.
movl %esi, %eax # block, <result>
addl $16, %esp #,
popl %ebx #
popl %esi #
popl %edi #
popl %ebp #
ret
.L14:
movl %ebx, %eax #, <<< wrong %ebx moved to %eax
stosw <<< FUBAR 1.
andb $1, %dl #, actsize
je .L4 #,
jmp .L15 #
.L13:
xorl %ebx, %ebx # tmp68 <<< %ebx is cleared here!!
testb $1, %al #, block
jne .L16 #,
.L7:
testl $2, %edi #, block
.p2align 4,,5
jne .L17 #,
.L8:
movl %edx, %ecx # actsize, tmp71
shrl $2, %ecx #, tmp71
movl %ebx, %eax # tmp68,
rep
stosl
testb $2, %dl #, actsize
je .L9 #,
jmp .L14 #
.L16:
movl %ebx, %eax #, <<< this part is OK, but for size > 4
stosb
subl $1, %edx #, actsize
jmp .L7 #
.L17:
movl %ebx, %eax #, <<< this part is OK, but for size > 4
stosw
subl $2, %edx #, actsize
jmp .L8 #
--cut here--
This can be confirmed by running the testcase:
> gcc -O2 -m32 -march=pentium4 -minline-all-stringops -DSIZE=1 mem.c
> ./a.out
Aborted
> gcc -O2 -m32 -march=pentium4 -minline-all-stringops -DSIZE=2 mem.c
> ./a.out
Aborted
> gcc -O2 -m32 -march=pentium4 -minline-all-stringops -DSIZE=4 mem.c
> ./a.out
> echo $?
0
--
Summary: Wrong code with optimized memset() (possible bug in RTL
bbro optimizer)
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: major
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ubizjak at gmail dot com
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30213