Summary: | [4.0 Regression] volatile causes mis-compiling | ||
---|---|---|---|
Product: | gcc | Reporter: | Andi Kleen <andi-gcc> |
Component: | tree-optimization | Assignee: | Diego Novillo <dnovillo> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | arend.bayer, gcc-bugs, giovannibajo, jeffreyalaw, pawel_sikora, pinskia, rakdver |
Priority: | P1 | Keywords: | wrong-code |
Version: | 4.0.0 | ||
Target Milestone: | 4.0.0 | ||
Host: | Target: | ||
Build: | Known to work: | 3.4.4 | |
Known to fail: | 4.0.0 | Last reconfirmed: | 2005-01-03 16:16:09 |
Attachments: |
miscompiled function
reduced testcase (1999 bytes) further reduced testcase |
Description
Andi Kleen
2004-10-31 01:03:22 UTC
Created attachment 7440 [details]
miscompiled function
Compiled with -O2 -fno-reorder-blocks -mno-redzone -mcmodel=kernel
It crashes eventually in this loop
do {
int idx;
tag_clear(pathp[0].node, tag, pathp[0].offset);
for (idx = 0; idx < (((1UL << 6) + 64 - 1) / 64); idx++) {
if (pathp[0].node->tags[tag][idx])
goto out;
}
pathp--;
} while (pathp[0].node);
when accessing pathp[0].node. I suspect it is overrunning the array on the
stack. With -O0 this crash doesn't happen.
Does -fno-ivopts produce a correct code? The final tree dump of the loop with -O1 -fno-ivopts looks like this: <L15>:; nr = pathp->offset; node = pathp->node; D.7834 = &node->tags; addr.9 = (volatile long int *) ((long unsigned int *) D.7834 + D.7838); __asm__ __volatile__("btrl %1,%0":"=m" *addr.9:"dIr" nr); D.7823 = node->tags[tag][0]; if (D.7823 != 0) goto out; else goto <L10>; <L10>:; pathp = pathp - 12B; if (pathp->node != 0B) goto <L15>; else goto out; With IV-OPTS, it is like this: <L15>:; nr = *ivtmp.124; node = *((struct radix_tree_node * *) ivtmp.124 + -8B); D.7834 = &node->tags; addr.9 = (volatile long int *) ((long unsigned int *) D.7834 + D.7838); __asm__ __volatile__("btrl %1,%0":"=m" *addr.9:"dIr" nr); D.7823 = node->tags[tag][0]; if (D.7823 != 0) goto out; else goto <L10>; Invalid sum of incoming frequencies 9667, should be 9550 <L10>:; D.7820 = *ivtmp.128; ivtmp.124 = ivtmp.124 - 12B; ivtmp.128 = ivtmp.128 - 12B; if (D.7820 != 0B) goto <L15>; else goto out; It looks like the end condition is now checked on the pointer BEFORE decrementing it, unless I am mistaken. Zdenek, two questions: - Aren't ivtmp.128 and ivtmp.124 duplicates? - Since they behave exactly like pathp, there is a way to at least preserve the variable name so that the code is easier to read? (In reply to comment #3) > Zdenek, two questions: > - Aren't ivtmp.128 and ivtmp.124 duplicates? Uhm, forget about this :) > - Since ivtmp.128 behaves exactly like pathp, there is a way to at least preserve the > variable name so that the code is easier to read? This would be still cool Also for the first loop, we look we do much worse code with -fivopts at least on powerpc64: With -fno-ivopts: offset = (int) (unsigned int) (index >> (int) shift) & 63; pathp = pathp.109 + 24B; pathp->offset = offset; temp.107 = *pathp.109->slot; pathp->node = temp.107; D.7848 = (struct radix_tree_node * *) &temp.107->slots + (void * *) ((long unsigned int) offset * 8); pathp->slot = D.7848; shift = shift - 6; height = height - 1; without: offset = (int) (unsigned int) (index >> (int) (shift + (unsigned int) ivtmp.126 * 4294967290)) & 63; <--- here why not keep the shift decrementing pathp.155 = ivtmp.137 + &path + 24B; D.8017 = &path + 24B; *((int *) ivtmp.137 + &D.8017->offset) = offset; D.8020 = (struct radix_tree_node * * *) ivtmp.137; temp.162 = **(D.8020 + &path[0].slot); *((struct radix_tree_node * *) ivtmp.137 + &D.8017->node) = temp.162; D.7848 = (struct radix_tree_node * *) &temp.162->slots + (void * *) ((long unsigned int) offset * 8); *(D.8020 + &D.8017->slot) = D.7848; height = height + 4294967295; The corresponding asm for the loop: with -fno-ivopts: L11: ld r11,8(r10) srd r2,r4,r7 addi r0,r7,-6 rldicl r2,r2,0,58 rldicl r7,r0,0,32 ld r9,0(r11) sldi r6,r2,3 addi r11,r10,24 stw r2,16(r11) addi r8,r9,8 std r9,24(r10) mr r10,r11 add r0,r8,r6 std r0,8(r11) bdnz L8 without: L11: add r11,r8,r30 mulli r0,r3,-6 <--- this is really not a good idea inside the loop add r10,r5,r8 addi r3,r3,1 ld r2,8(r11) addi r7,r11,24 add r0,r0,r12 ld r9,0(r2) srd r0,r4,r0 rldicl r0,r0,0,58 addi r6,r9,8 sldi r11,r0,3 stdx r9,r8,r5 stw r0,16(r10) addi r8,r8,24 add r2,r6,r11 std r2,8(r10 I must correct myself slightly - the kernel oops I see is not the second loop, but in the check before: ret = *pathp[0].slot; <--- references either NULL or a bogus pointer if (ret == NULL) goto out; I tried various options, but only -O0 resolves it. -O1, -O2 -fno-ivopts or -O1 -fno-tree-loop-ivcanon doesn't (depending on the options the crash is a NULL pointer or a bogus non NULL pointer) It's hard unfortunately to construct a run time test because it would need to create a simulated radixtree. To clarify the previous comment means that the problem is likely in the first loop in the function, not in the second Does this still happen? Problem still happens as of HEAD 041125 Can you try with -fno-tree-ch (or -Os)? Tried -fno-tree-ch. Didn't help. -O1 -fno-tree-dominator-opts seems to avoid the problem When I comment out the first pass_dominator in tree-optimize.c and compile with -O1 it works too. Confirmed, this looks like a jump threading bug. Before we have: goto <bb 9> (<L10>); <L8>:; D.7815_42 = pathp_2->node; tag_43 = tag_27; idx_44 = idx_6; D.7819_45 = D.7815_42->tags[tag_43][idx_44]; if (D.7819_45 != 0) goto out; else goto <L9>; <L9>:; idx_46 = idx_6 + 1; # idx_6 = PHI <idx_39(6), idx_46(8)>; <L10>:; if (idx_6 == 0) goto <L8>; else goto <L11>; But after: goto <bb 8> (<L10>); <L9>:; idx_46 = 1; goto <bb 10> (<L15>); # idx_57 = PHI <0(6)>; <L10>:; D.7815_42 = pathp_2->node; tag_43 = tag_27; idx_44 = 0; <---- this is just plainly wrong. D.7819_45 = D.7815_42->tags[tag_27][0]; if (D.7819_45 != 0) goto out; else goto <L9>; Hmm, the oops was in the test before the second loop, but idx is in the second loop. But maybe I misread it. Subject: Re: [4.0 Regression] linux kernel loop gets miscompiled > > ------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-28 01:28 ------- > Confirmed, this looks like a jump threading bug. > Before we have: > goto <bb 9> (<L10>); > > <L8>:; > D.7815_42 = pathp_2->node; > tag_43 = tag_27; > idx_44 = idx_6; > D.7819_45 = D.7815_42->tags[tag_43][idx_44]; > if (D.7819_45 != 0) goto out; else goto <L9>; > > <L9>:; > idx_46 = idx_6 + 1; > > # idx_6 = PHI <idx_39(6), idx_46(8)>; > <L10>:; > if (idx_6 == 0) goto <L8>; else goto <L11>; > > > But after: > goto <bb 8> (<L10>); > > <L9>:; > idx_46 = 1; > goto <bb 10> (<L15>); > > # idx_57 = PHI <0(6)>; > <L10>:; > D.7815_42 = pathp_2->node; > tag_43 = tag_27; > idx_44 = 0; <---- this is just plainly wrong. this seems OK to me (note the "weird" condition that ends the loop -- this loops iterates exactly once, with idx being 0). > D.7819_45 = D.7815_42->tags[tag_27][0]; > if (D.7819_45 != 0) goto out; else goto <L9>; I see no evidence that 18241 has the same root case as 18694. (In reply to comment #15) > Confirmed, this looks like a jump threading bug. > Before we have: > goto <bb 9> (<L10>); > > <L8>:; > D.7815_42 = pathp_2->node; > tag_43 = tag_27; > idx_44 = idx_6; > D.7819_45 = D.7815_42->tags[tag_43][idx_44]; > if (D.7819_45 != 0) goto out; else goto <L9>; > > <L9>:; > idx_46 = idx_6 + 1; > > # idx_6 = PHI <idx_39(6), idx_46(8)>; > <L10>:; > if (idx_6 == 0) goto <L8>; else goto <L11>; > > > But after: > goto <bb 8> (<L10>); > > <L9>:; > idx_46 = 1; > goto <bb 10> (<L15>); > > # idx_57 = PHI <0(6)>; > <L10>:; > D.7815_42 = pathp_2->node; > tag_43 = tag_27; > idx_44 = 0; <---- this is just plainly wrong. > D.7819_45 = D.7815_42->tags[tag_27][0]; > if (D.7819_45 != 0) goto out; else goto <L9>; Huh? idx_44 must be zero, I don't see how any other value can be correct here. THe only way to get to L8 (block #7) is when idx_6 == 0 and idx_44 is set unconditionally from idx_6. So what about that assignment do you think is wrong? Note that we also know that idx_46 must have the value 1 because it's only reachable from block #7 where we know that idx_6 == 0 and thus idx_46 must have the value 1. FWIW, I've walked through all the actions of DOM1 and AFAICT every change it makes is correct. I would have to strongly suspect that DOM1 is not the problem here, but is instead changing things in such a way as to expose some downstream issue. I've also looked pretty closely at the later tree optimizer dumps. If indeed the problem is the ret = *pathp[0].slot statement blowing up, I would strongly suggest some printf debugging of the first loop. Namely, how many times does it iterate and what is the value of pathp before and after the loop. Jeff methinks the bug is in this code: /* ??? This bit ought not be needed. For any element not present in the initializer, we should simply set them to zero. Except we'd need to *find* the elements that are not present, and that requires trickery to avoid quadratic compile-time behavior in large cases or excessive memory use in small cases. */ else { HOST_WIDE_INT len = list_length (elt_list); if (TREE_CODE (type) == ARRAY_TYPE) { tree nelts = array_type_nelts (type); if (!host_integerp (nelts, 1) || tree_low_cst (nelts, 1) + 1 != len) cleared = true; } else if (len != fields_length (type)) cleared = true; } We have this expr: objD.1485 = {{.sD.1468=(const charD.1 *) (charD.1 *) "m0"}, {}} The type of the constructor is an ARRAY_TYPE, nelts == 1, and len (the length of the constructor) is 2. So nelts+1 == 2, and we set cleared = true. But we ignore the fact that the second element of the constructor is empty, and that the first one is incomplete. My last comment of course does not concern this bug, oops sorry :-) We have reason to believe this bug is now fixed, but it is kind of hard to test... Can you please retry with CVS HEAD? Problem still happens with 4.0.0 20041219 Happens on x86 too (symptom is an oops on boot in radix_tree_tag_clear). Sorry, should have been a bit more specific probably: gcc version 4.0.0 20050103 (experimental) (i686-pc-linux-gnu) Kernel is 2.6.10, and the oops is triggered for me by the EXT3 code. Created attachment 7865 [details]
reduced testcase (1999 bytes)
this testcase segfaults on i386 with either -O1 or -O2 -fno-strict-aliasing.
Created attachment 7866 [details]
further reduced testcase
Compiling this somewhat smaller testcase with
gcc radix-tree4.c -o radix-tree4 -g -O1 -fno-reorder-blocks -save-temps -da
-fdump-tree-all-lineno
shows that the problem is dce1: The dump of t19.copyrename1:
radix_tree_tag_clear (root, index)
{
int offset;
volatile long unsigned int * addr;
unsigned int shift;
unsigned int height;
struct radix_tree_path * pathp;
struct radix_tree_path path[7];
long unsigned int * D.1162;
struct radix_tree_node * D.1161;
struct radix_tree_node * * D.1160;
void * * D.1159;
unsigned int D.1158;
unsigned int offset.1;
struct radix_tree_node * * D.1156;
void *[64] * D.1155;
struct radix_tree_node * D.1154;
struct radix_tree_node * D.1153;
struct radix_tree_node * * D.1152;
struct radix_tree_path * D.1151;
int D.1150;
long unsigned int D.1149;
int shift.0;
struct radix_tree_node * * D.1147;
unsigned int D.1146;
<bb 0>:
[radix-tree4.c : 25] pathp_4 = &path;
[radix-tree4.c : 29] height_6 = root_5->height;
[radix-tree4.c : 31] D.1146_7 = height_6 * 6;
[radix-tree4.c : 31] shift_8 = D.1146_7 + 0fffffffa;
[radix-tree4.c : 32] D.1147_9 = &root_5->rnode;
[radix-tree4.c : 32] path[0].slot = D.1147_9;
[radix-tree4.c : 34] goto <bb 2> (<L1>);
The line "[[radix-tree4.c : 32] path[0].slot = D.1147_9;" is optimized away,
although path[0] is aliased by pathp, and thus used in the loop:
<L0>:;
[radix-tree4.c : 37] shift.0_15 = (int) shift_3;
[radix-tree4.c : 37] D.1149_17 = index_16 >> shift.0_15;
[radix-tree4.c : 37] D.1150_18 = (int) D.1149_17;
[radix-tree4.c : 37] offset_19 = D.1150_18 & 63;
[radix-tree4.c : 38] D.1151_20 = pathp_1 + 12B;
[radix-tree4.c : 38] D.1151_20->offset = offset_19;
[radix-tree4.c : 39] D.1151_21 = pathp_1 + 12B;
[radix-tree4.c : 39] D.1152_22 = pathp_1->slot;
[radix-tree4.c : 39] D.1153_23 = *D.1152_22;
(...)
Here is the dump of t20.dce1:radix_tree_tag_clear (root, index)
{
int offset;
volatile long unsigned int * addr;
unsigned int shift;
unsigned int height;
struct radix_tree_path * pathp;
struct radix_tree_path path[7];
long unsigned int * D.1162;
struct radix_tree_node * D.1161;
struct radix_tree_node * * D.1160;
void * * D.1159;
unsigned int D.1158;
unsigned int offset.1;
struct radix_tree_node * * D.1156;
void *[64] * D.1155;
struct radix_tree_node * D.1154;
struct radix_tree_node * D.1153;
struct radix_tree_node * * D.1152;
struct radix_tree_path * D.1151;
int D.1150;
long unsigned int D.1149;
int shift.0;
struct radix_tree_node * * D.1147;
unsigned int D.1146;
<bb 0>:
[radix-tree4.c : 25] pathp_4 = &path;
[radix-tree4.c : 29] height_6 = root_5->height;
[radix-tree4.c : 31] D.1146_7 = height_6 * 6;
[radix-tree4.c : 31] shift_8 = D.1146_7 + 0fffffffa;
[radix-tree4.c : 32] D.1147_9 = &root_5->rnode;
[radix-tree4.c : 34] goto <bb 2> (<L1>);
(Loop follows.)
Hmm, the following statements don't have VOPS: D.1145_20->offset = offset_19; D.1146_22 = pathp_1->slot; D.1145_21->node = D.1147_23; D.1148_26 = D.1145_25->node; D.1145_25->slot = D.1154_32; D.1155_12 = pathp_1->node; // even smaller testcase: void abort (void); void radix_tree_tag_clear (int *node) { int *path[2], **pathp = path, height; volatile int *addr; height = 1; pathp[0] = node; while (height > 0) { pathp[1] = pathp[0]; pathp++; height--; } addr = pathp[0]; *addr = 1; } int main () { int n; radix_tree_tag_clear (&n); if (n != 1) abort (); return 0; } I'm not sure if it's the root cause yet, or just an interesting rat hole, but we do not create any virtual operands for the *addr = <whatever> statement (or for any statement with volatile operands). (In reply to comment #30) (In reply to comment #31) > I'm not sure if it's the root cause yet, or just an interesting rat hole, but > we do not create any virtual operands for the *addr = <whatever> statement > (or for any statement with volatile operands). Note if I remove the volatile from the example in comment #30, then it works so it looks like that is root call because we don't know if we use the result of path[0]. Here is the most reduced testcase: void abort (void); int f(int i1243) { int i[2], *i1 = i; i[0] = 1; volatile int *i2 = i1; i2[1] = 1; i1243 = 0; return i2[1]+i2[0]; } int main(void) { if( f(100) != 2) abort (); return 0; } Notice we remove i[0] = 1 the .t17.ssa dump: f (i1243D.1466) { volatile intD.0 * i2D.1471; intD.0 * i1D.1470; intD.0 iD.1469[2]; intD.0 D.1475; intD.0 D.1474; intD.0 D.1473; volatile intD.0 * D.1472; # BLOCK 0 # PRED: ENTRY (fallthru) i1D.1470_1 = &iD.1469; # iD.1469_3 = V_MAY_DEF <iD.1469_2>; iD.1469[0] = 1; i2D.1471_4 = i1D.1470_1; D.1472_5 = i2D.1471_4 + 4B; *D.1472_5 = 1; i1243D.1466_6 = 0; D.1472_7 = i2D.1471_4 + 4B; D.1474_8 = *D.1472_7; D.1475_9 = *i2D.1471_4; D.1473_10 = D.1474_8 + D.1475_9; return D.1473_10; # SUCC: EXIT } Shouldn't there be a VUSE for "D.1474_8 = *D.1472_7"? // final version :) void abort (void); int main () { int a; volatile int *b = &a; a = 1; if (*b != 1) abort (); return 0; } Changing the summary as we finnally found the problem. BTW, the original case is no longer reproducible on mainline, but the test in #35 still fails. This happens when a pointer to volatile storage takes the address of a variable. One way to fix this is by marking variables that have been aliased with volatile tags. When adding operands for these variables, we mark the statement as having volatile operands. While correct, this tends to pessimize code. I'm working on a variant of this patch that would only mark variables whose address is taken by a pointer pointing to volatile memory. Jeff has been working on the idea of exposing volatile operands to the optimizers and make the optimizers responsible for dealing with them properly. This has the advantage that we would not have any magic operands hidden from us, but it requires auditing all the optimizers to make sure that they don't mess things up. Given the stage that we are for 4.0, it may be better to go with the simpler solution. But perhaps we can finish the other approach in time. Diego. Index: tree-flow.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-flow.h,v retrieving revision 2.75 diff -d -c -p -u -r2.75 tree-flow.h --- tree-flow.h 10 Dec 2004 21:54:41 -0000 2.75 +++ tree-flow.h 5 Jan 2005 00:06:23 -0000 @@ -174,6 +174,9 @@ struct var_ann_d GTY(()) in the v_may_def list. */ unsigned in_v_may_def_list : 1; + /* Nonzero if this variable is aliased to a volatile variable. */ + unsigned aliased_with_volatile : 1; + /* An artificial variable representing the memory location pointed-to by all the pointers that TBAA (type-based alias analysis) considers to be aliased. If the variable is not a pointer or if it is never Index: tree-ssa-alias.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-ssa-alias.c,v retrieving revision 2.64 diff -d -c -p -u -r2.64 tree-ssa-alias.c --- tree-ssa-alias.c 20 Dec 2004 18:18:28 -0000 2.64 +++ tree-ssa-alias.c 5 Jan 2005 00:06:24 -0000 @@ -1668,6 +1668,9 @@ add_may_alias (tree var, tree alias) else if (is_call_clobbered (alias)) mark_call_clobbered (var); + if (TREE_THIS_VOLATILE (var)) + a_ann->aliased_with_volatile = 1; + VARRAY_PUSH_TREE (v_ann->may_aliases, alias); a_ann->is_alias_tag = 1; } Index: tree-ssa-operands.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-ssa-operands.c,v retrieving revision 2.60 diff -d -c -p -u -r2.60 tree-ssa-operands.c --- tree-ssa-operands.c 10 Dec 2004 17:28:32 -0000 2.60 +++ tree-ssa-operands.c 5 Jan 2005 00:06:24 -0000 @@ -1528,6 +1528,9 @@ add_stmt_operand (tree *var_p, stmt_ann_ return; } + if (v_ann->aliased_with_volatile && s_ann) + s_ann->has_volatile_ops = true; + if (is_real_op) { /* The variable is a GIMPLE register. Add it to real operands. */ *** Bug 19316 has been marked as a duplicate of this bug. *** |