Bug 53705

Summary: wrong code with custom flags - stores to memory are lost
Product: gcc Reporter: Zdenek Sojka <zsojka>
Component: rtl-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: amylaar, gabravier, matz
Priority: P3 Keywords: alias, wrong-code
Version: 4.8.0   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255
Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu
Build: Known to work:
Known to fail: 4.3.6, 4.4.7, 4.5.4, 4.6.4, 4.7.2, 4.8.0 Last reconfirmed: 2012-06-17 00:00:00
Bug Depends on:    
Bug Blocks: 58036    
Attachments: reduced testcase (from testsuite/gcc.c-torture/execute/loop-2e.c)

Description Zdenek Sojka 2012-06-17 12:20:23 UTC
Created attachment 27640 [details]
reduced testcase (from testsuite/gcc.c-torture/execute/loop-2e.c)

GCC 4.3-4.8 fails, while with GCC 4.2, the code is reduced just to return 0. I am not sure if this should be marked as a regression.

Compiler output:
$ gcc -O2 -fno-omit-frame-pointer -fpeel-loops -fsched2-use-superblocks -fno-tree-loop-optimize -fno-web --param=max-completely-peel-times=256 testcase.c
$ ./a.out
Aborted

Looking at the assembly:
...
	sub	rsp, 320	#,
	mov	rdx, QWORD PTR p[rip]	# p.0, p
	lea	rcx, [rbp-320]	# q,
...
	mov	QWORD PTR [rcx+240], rdi	# *q_19, tmp73
	lea	rax, [rdx+144]	# tmp73,
	lea	rsi, [rdx+148]	# tmp73,
	lea	rdi, [rdx+152]	# tmp73,
	lea	rdx, [rdx+156]	# tmp73,
	cmp	QWORD PTR [rbp-8], rdx	# q, tmp73
	mov	QWORD PTR [rcx+280], rax	# *q_19, tmp73
	mov	QWORD PTR [rcx+248], r8	# *q_19, tmp73
	mov	QWORD PTR [rcx+256], r9	# *q_19, tmp73
	mov	QWORD PTR [rcx+264], r10	# *q_19, tmp73
	mov	QWORD PTR [rcx+272], r11	# *q_19, tmp73
	mov	QWORD PTR [rcx+288], rsi	# *q_19, tmp73
	mov	QWORD PTR [rcx+296], rdi	# *q_19, tmp73
	mov	QWORD PTR [rcx+304], rdx	# *q_19, tmp73
	jne	.L46	#,
...
It seems the dependency between storing to *q++ (esp. q[39]) in foo() and verifying it in main() is lost. Furthermore, it seems q[39] ([rcx+312]) is not stored to at all.
Comment 1 Michael Matz 2012-06-17 15:21:10 UTC
Problem is somewhere in cselib.  We test true_dependence of these two insns
(last write to q[] and the read of the last element):

(insn 13 299 22 2 (set (mem/f:DI (plus:DI (reg/v/f:DI 2 cx [orig:63 q ] [63])
                (const_int 144 [0x90])) [2 *q_19+0 S8 A64])
        (reg:DI 1 dx [73])) pr53705.c:9 62 {*movdi_internal_rex64}
     (expr_list:REG_DEAD (reg/v/f:DI 2 cx [orig:63 q ] [63])
        (nil)))

(insn 21 350 257 2 (set (reg:CCZ 17 flags)
        (compare:CCZ (mem/f/c:DI (plus:DI (reg/f:DI 6 bp)
                    (const_int -8 [0xfffffffffffffff8])) [2 q+152 S8 A64])
            (reg:DI 1 dx [73]))) pr53705.c:19 7 {*cmpdi_1}
     (expr_list:REG_DEAD (reg:DI 1 dx [73])
        (nil)))

For that we check write_dependence of:
  (mem/f:DI (plus:DI (reg/v/f:DI 2 cx [orig:63 q ] [63])
                     (const_int 144 [0x90])) [2 *q_19+0 S8 A64])
vs
  (mem/f/c:DI (plus:DI (reg/f:DI 6 bp)
                       (const_int -8 [0xfffffffffffffff8])) [2 q+152 S8 A64])

(note that at this point cx == bp-152, hence this accesses the same memory.

true_dependence uses base_alias_check to test these two mems and wants
to disambiguate the bases.  cselib is used for that and in valueized form
the two addresses look like so:
  (plus:DI (value:DI 8:12039 @0x1b9ef68/0x1b55410) (const_int 144 [0x90]))
  (plus:DI (value:DI 2:4059 @0x1b9eed8/0x1b552f0) (const_int -8))

From cselib we have these details of the two involved VALUEs:

(value:DI 8:12039 @0x1b9ef68/0x1b55410)
 locs:
  from insn 43 (reg/v/f:DI 2 cx [orig:63 q ] [63])
  from insn 43 (plus:DI (value:DI 5:7965 @0x1b9ef20/0x1b55380)
            (const_int 8 [0x8]))

(value:DI 2:4059 @0x1b9eed8/0x1b552f0)
 locs:
  from insn 352 (reg/f:DI 6 bp)
  from insn 351 (plus:DI (value:DI 1:1 @0x1b9eec0/0x1b552c0)
            (const_int -8 [0xfffffffffffffff8]))

That is, value 2 is bp-based (or value 1) based, and value 8 is value 5 based,
which itself is:

(value:DI 5:7965 @0x1b9ef20/0x1b55380)
 locs:
  from insn 353 (reg/f:DI 7 sp)
  from insn 353 (plus:DI (value:DI 2:4059 @0x1b9eed8/0x1b552f0)
            (const_int -160 [0xffffffffffffff60]))

I.e. value 5 (and hence 8) is value 2 based (like the other mem), or sp based.

Now, find_base_term() for [bp-8] will return "(address:DI -4)", which comes from
the REG_BASE_VALUE of the (reg/f:DI 6 bp).

find_base_term ([cx+144]) otoh will go via value 8 to value 5, from there
to REG_BASE_VALUE([sp]), which returns "(address:DI -1)".  If find_base_term
would skip the first loc ("sp") and try to look into the second loc (val 2)
it would also return (address:DI -4).

Now, two ADDRESS rtxes that aren't pointer-equal aren't equivalent, and hence
the disambiguator thinks that the two mems cannot point into the same memory.

Obviously the problem is some confusion in setting up REG_BASE_VALUE for
sp and bp.  When we have a frame pointer then both should have the same base,
not different ones.
Comment 2 Michael Matz 2012-06-17 15:36:19 UTC
Or alternatively cselib doesn't respect one invariant in constructing the
locations of its VALUEs.  As seen above it constructs two values for the same
memory area, one referring to stack pointer, the other to (hard) frame pointer.

But alias.c explains:

     2. stack_pointer_rtx, frame_pointer_rtx, hard_frame_pointer_rtx
        (if distinct from frame_pointer_rtx) and arg_pointer_rtx.
        Each of these rtxes has a separate ADDRESS associated with it,
        each with a negative id.

        GCC is (and is required to be) precise in which register it
        chooses to access a particular region of stack.  We can therefore
        assume that accesses based on one of these rtxes do not alias
        accesses based on another of these rtxes.

Note the last paragraph.  The RTL instructions themself respect this invariant
(there are no accesses via [sp], only via [bp] or derived values).  But the
cselib values don't.  I'd say value 5 (the one referring to sp and value 2)
is the broken one.  It should only refer to value 2.
Comment 3 Jorn Wolfgang Rennecke 2013-07-30 13:37:20 UTC
(In reply to Zdenek Sojka from comment #0)
> Created attachment 27640 [details]
> reduced testcase (from testsuite/gcc.c-torture/execute/loop-2e.c)
> 
> GCC 4.3-4.8 fails, while with GCC 4.2, the code is reduced just to return 0.
> I am not sure if this should be marked as a regression.
> 
> Compiler output:
> $ gcc -O2 -fno-omit-frame-pointer -fpeel-loops -fsched2-use-superblocks
> -fno-tree-loop-optimize -fno-web --param=max-completely-peel-times=256
> testcase.c
> $ ./a.out
> Aborted
> 
> Looking at the assembly:
> ...
> 	sub	rsp, 320	#,
> 	mov	rdx, QWORD PTR p[rip]	# p.0, p
> 	lea	rcx, [rbp-320]	# q,
> ...
> 	mov	QWORD PTR [rcx+240], rdi	# *q_19, tmp73
> 	lea	rax, [rdx+144]	# tmp73,
> 	lea	rsi, [rdx+148]	# tmp73,
> 	lea	rdi, [rdx+152]	# tmp73,
> 	lea	rdx, [rdx+156]	# tmp73,
> 	cmp	QWORD PTR [rbp-8], rdx	# q, tmp73
> 	mov	QWORD PTR [rcx+280], rax	# *q_19, tmp73
> 	mov	QWORD PTR [rcx+248], r8	# *q_19, tmp73
> 	mov	QWORD PTR [rcx+256], r9	# *q_19, tmp73
> 	mov	QWORD PTR [rcx+264], r10	# *q_19, tmp73
> 	mov	QWORD PTR [rcx+272], r11	# *q_19, tmp73
> 	mov	QWORD PTR [rcx+288], rsi	# *q_19, tmp73
> 	mov	QWORD PTR [rcx+296], rdi	# *q_19, tmp73
> 	mov	QWORD PTR [rcx+304], rdx	# *q_19, tmp73
> 	jne	.L46	#,
> ...
> It seems the dependency between storing to *q++ (esp. q[39]) in foo() and
> verifying it in main() is lost. Furthermore, it seems q[39] ([rcx+312]) is
> not stored to at all.

I can't reproduce the unrolling with an i686-pc-linux-gnu X x86_64-pc-linux-gnu
compiler, neither for 4.8.2 20130729 (prerelease) , nor for 4.9.0 20130729 (experimental) sources.

Is that an issuse with the cross-compiler, or with the compiler version?