Summary: | wrong code with custom flags - stores to memory are lost | ||
---|---|---|---|
Product: | gcc | Reporter: | Zdenek Sojka <zsojka> |
Component: | rtl-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | normal | CC: | amylaar, gabravier, matz |
Priority: | P3 | Keywords: | alias, wrong-code |
Version: | 4.8.0 | ||
Target Milestone: | --- | ||
See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 | ||
Host: | x86_64-pc-linux-gnu | Target: | x86_64-pc-linux-gnu |
Build: | Known to work: | ||
Known to fail: | 4.3.6, 4.4.7, 4.5.4, 4.6.4, 4.7.2, 4.8.0 | Last reconfirmed: | 2012-06-17 00:00:00 |
Bug Depends on: | |||
Bug Blocks: | 58036 | ||
Attachments: | reduced testcase (from testsuite/gcc.c-torture/execute/loop-2e.c) |
Problem is somewhere in cselib. We test true_dependence of these two insns (last write to q[] and the read of the last element): (insn 13 299 22 2 (set (mem/f:DI (plus:DI (reg/v/f:DI 2 cx [orig:63 q ] [63]) (const_int 144 [0x90])) [2 *q_19+0 S8 A64]) (reg:DI 1 dx [73])) pr53705.c:9 62 {*movdi_internal_rex64} (expr_list:REG_DEAD (reg/v/f:DI 2 cx [orig:63 q ] [63]) (nil))) (insn 21 350 257 2 (set (reg:CCZ 17 flags) (compare:CCZ (mem/f/c:DI (plus:DI (reg/f:DI 6 bp) (const_int -8 [0xfffffffffffffff8])) [2 q+152 S8 A64]) (reg:DI 1 dx [73]))) pr53705.c:19 7 {*cmpdi_1} (expr_list:REG_DEAD (reg:DI 1 dx [73]) (nil))) For that we check write_dependence of: (mem/f:DI (plus:DI (reg/v/f:DI 2 cx [orig:63 q ] [63]) (const_int 144 [0x90])) [2 *q_19+0 S8 A64]) vs (mem/f/c:DI (plus:DI (reg/f:DI 6 bp) (const_int -8 [0xfffffffffffffff8])) [2 q+152 S8 A64]) (note that at this point cx == bp-152, hence this accesses the same memory. true_dependence uses base_alias_check to test these two mems and wants to disambiguate the bases. cselib is used for that and in valueized form the two addresses look like so: (plus:DI (value:DI 8:12039 @0x1b9ef68/0x1b55410) (const_int 144 [0x90])) (plus:DI (value:DI 2:4059 @0x1b9eed8/0x1b552f0) (const_int -8)) From cselib we have these details of the two involved VALUEs: (value:DI 8:12039 @0x1b9ef68/0x1b55410) locs: from insn 43 (reg/v/f:DI 2 cx [orig:63 q ] [63]) from insn 43 (plus:DI (value:DI 5:7965 @0x1b9ef20/0x1b55380) (const_int 8 [0x8])) (value:DI 2:4059 @0x1b9eed8/0x1b552f0) locs: from insn 352 (reg/f:DI 6 bp) from insn 351 (plus:DI (value:DI 1:1 @0x1b9eec0/0x1b552c0) (const_int -8 [0xfffffffffffffff8])) That is, value 2 is bp-based (or value 1) based, and value 8 is value 5 based, which itself is: (value:DI 5:7965 @0x1b9ef20/0x1b55380) locs: from insn 353 (reg/f:DI 7 sp) from insn 353 (plus:DI (value:DI 2:4059 @0x1b9eed8/0x1b552f0) (const_int -160 [0xffffffffffffff60])) I.e. value 5 (and hence 8) is value 2 based (like the other mem), or sp based. Now, find_base_term() for [bp-8] will return "(address:DI -4)", which comes from the REG_BASE_VALUE of the (reg/f:DI 6 bp). find_base_term ([cx+144]) otoh will go via value 8 to value 5, from there to REG_BASE_VALUE([sp]), which returns "(address:DI -1)". If find_base_term would skip the first loc ("sp") and try to look into the second loc (val 2) it would also return (address:DI -4). Now, two ADDRESS rtxes that aren't pointer-equal aren't equivalent, and hence the disambiguator thinks that the two mems cannot point into the same memory. Obviously the problem is some confusion in setting up REG_BASE_VALUE for sp and bp. When we have a frame pointer then both should have the same base, not different ones. Or alternatively cselib doesn't respect one invariant in constructing the locations of its VALUEs. As seen above it constructs two values for the same memory area, one referring to stack pointer, the other to (hard) frame pointer. But alias.c explains: 2. stack_pointer_rtx, frame_pointer_rtx, hard_frame_pointer_rtx (if distinct from frame_pointer_rtx) and arg_pointer_rtx. Each of these rtxes has a separate ADDRESS associated with it, each with a negative id. GCC is (and is required to be) precise in which register it chooses to access a particular region of stack. We can therefore assume that accesses based on one of these rtxes do not alias accesses based on another of these rtxes. Note the last paragraph. The RTL instructions themself respect this invariant (there are no accesses via [sp], only via [bp] or derived values). But the cselib values don't. I'd say value 5 (the one referring to sp and value 2) is the broken one. It should only refer to value 2. (In reply to Zdenek Sojka from comment #0) > Created attachment 27640 [details] > reduced testcase (from testsuite/gcc.c-torture/execute/loop-2e.c) > > GCC 4.3-4.8 fails, while with GCC 4.2, the code is reduced just to return 0. > I am not sure if this should be marked as a regression. > > Compiler output: > $ gcc -O2 -fno-omit-frame-pointer -fpeel-loops -fsched2-use-superblocks > -fno-tree-loop-optimize -fno-web --param=max-completely-peel-times=256 > testcase.c > $ ./a.out > Aborted > > Looking at the assembly: > ... > sub rsp, 320 #, > mov rdx, QWORD PTR p[rip] # p.0, p > lea rcx, [rbp-320] # q, > ... > mov QWORD PTR [rcx+240], rdi # *q_19, tmp73 > lea rax, [rdx+144] # tmp73, > lea rsi, [rdx+148] # tmp73, > lea rdi, [rdx+152] # tmp73, > lea rdx, [rdx+156] # tmp73, > cmp QWORD PTR [rbp-8], rdx # q, tmp73 > mov QWORD PTR [rcx+280], rax # *q_19, tmp73 > mov QWORD PTR [rcx+248], r8 # *q_19, tmp73 > mov QWORD PTR [rcx+256], r9 # *q_19, tmp73 > mov QWORD PTR [rcx+264], r10 # *q_19, tmp73 > mov QWORD PTR [rcx+272], r11 # *q_19, tmp73 > mov QWORD PTR [rcx+288], rsi # *q_19, tmp73 > mov QWORD PTR [rcx+296], rdi # *q_19, tmp73 > mov QWORD PTR [rcx+304], rdx # *q_19, tmp73 > jne .L46 #, > ... > It seems the dependency between storing to *q++ (esp. q[39]) in foo() and > verifying it in main() is lost. Furthermore, it seems q[39] ([rcx+312]) is > not stored to at all. I can't reproduce the unrolling with an i686-pc-linux-gnu X x86_64-pc-linux-gnu compiler, neither for 4.8.2 20130729 (prerelease) , nor for 4.9.0 20130729 (experimental) sources. Is that an issuse with the cross-compiler, or with the compiler version? |
Created attachment 27640 [details] reduced testcase (from testsuite/gcc.c-torture/execute/loop-2e.c) GCC 4.3-4.8 fails, while with GCC 4.2, the code is reduced just to return 0. I am not sure if this should be marked as a regression. Compiler output: $ gcc -O2 -fno-omit-frame-pointer -fpeel-loops -fsched2-use-superblocks -fno-tree-loop-optimize -fno-web --param=max-completely-peel-times=256 testcase.c $ ./a.out Aborted Looking at the assembly: ... sub rsp, 320 #, mov rdx, QWORD PTR p[rip] # p.0, p lea rcx, [rbp-320] # q, ... mov QWORD PTR [rcx+240], rdi # *q_19, tmp73 lea rax, [rdx+144] # tmp73, lea rsi, [rdx+148] # tmp73, lea rdi, [rdx+152] # tmp73, lea rdx, [rdx+156] # tmp73, cmp QWORD PTR [rbp-8], rdx # q, tmp73 mov QWORD PTR [rcx+280], rax # *q_19, tmp73 mov QWORD PTR [rcx+248], r8 # *q_19, tmp73 mov QWORD PTR [rcx+256], r9 # *q_19, tmp73 mov QWORD PTR [rcx+264], r10 # *q_19, tmp73 mov QWORD PTR [rcx+272], r11 # *q_19, tmp73 mov QWORD PTR [rcx+288], rsi # *q_19, tmp73 mov QWORD PTR [rcx+296], rdi # *q_19, tmp73 mov QWORD PTR [rcx+304], rdx # *q_19, tmp73 jne .L46 #, ... It seems the dependency between storing to *q++ (esp. q[39]) in foo() and verifying it in main() is lost. Furthermore, it seems q[39] ([rcx+312]) is not stored to at all.