Command line: $ gcc -O1 -fschedule-insns2 -fselective-scheduling2 testcase.c Compiler output: $ gcc -O1 -fschedule-insns2 -fselective-scheduling2 testcase.c testcase.c: In function 'foo': testcase.c:18:1: internal compiler error: in move_op_ascend, at sel-sched.c:6124 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. Tested revisions: r163636 - crash r153685 - crash 4.4 r149995 - OK
Created attachment 21630 [details] reduced testcase $ gcc -O1 -fschedule-insns2 -fselective-scheduling2 pr45472.c
It is caused by revision 147282: http://gcc.gnu.org/ml/gcc-cvs/2009-05/msg00256.html
We have the code like this: if (...) { 17 cx:DI=[`s2'] //comes from s2.vl += s1.vl; ... } 27 dx:DI=[`s2'] //comes from s1 = s2; When the scheduler tries to move insn 27 before if (...), it also unifies its right-hand sides, as they seem equal. The scheduler wants to get: 27 dx:DI=[`s2'] if (...) { cx = dx ... dx:DI=[`s2'] // bookkeeping } The insn 17 has its MEM with volatile bit set, the insn 27 has it unset. It so happens that when gathering the available insn set and when moving the actually selected insn, insns 17 and 27 got merged in the different order. First we don't have the volatile bit on the resulting insn, thus we believe the load does not trap and we move it up through a jump before the 'if'. Second we have the bit and thus insn traps and we don't move it, then we hit the consistency assert in the scheduler. Now, I'm happy to implement the correct merging of the may_trap_p bit in the scheduler which would "fix" this. However, looking at the original C code it looks like both MEMs should have their volatile bit set. I can only say that the original bits seem to come from expand, the addresses got propagated by fwprop but this doesn't seem to be the issue.
A small testcase to illustrate the problem with volatile fields. //---8<--- struct vv {volatile long a, b;} vv1, vv2; int foo() { vv1 = vv2; } //---8<--- gcc/cc1 -O2 -frename-registers -fschedule-insns2 vol.c movq vv2+8(%rip), %rax movq vv2(%rip), %rdx movq %rax, vv1+8(%rip) movq %rdx, vv1(%rip) The compiler reorders accesses to volatile fields. As Andrey said, /v bits are missing on MEMs even in the .expand dump.
Looks like a problem in expand. CCing Matz.
Anybody familiar with the expand/tree level can take a look on this? I can spare some time later this week if folks are busy.
Well, the problem is not only in expand. As you say we start with this code: ... vv1 = vv2; ... That is already a non-volatile statement on the gimple level. Making the whole vv1/vv2 object volatile at the top-level of course changes this into a volatile statement: ... vv1 ={v} vv2; ... The problem with this is that at the gimple level the volatility is not represented, and hence could already result in invalid transformations. This extends into expand, so in a way the wrong MEM expressions are a result of that. Assuming we'd extend expand to expand such block moves, where parts of the block are volatile into a series of volatile accesses, we could fix the bug at hand. That would still leave the problem of non-volatility on the gimple level. This all points to a deeper problem IMO, one we need to decide how to solve before tackling the details of this bug, namely: how are non-volatile objects containing volatile subobjects handled? I don't know if the language standard says anything about this. We have several possibilities: 1) make all objects containing volatile subobjects volatile itself 2) make instructions touching such objects volatile 3) make instructions touching the volatile components volatile The first has the problem that volatile objects with aggregate type implicitely contain only volatile subobjects, that is for such object: struct {int a; volatile int b;} half_volatile; with solution (1) this would be equivalent to volatile struct {int a; volatile int b;} half_volatile; and hence halt_volatile.a would be volatile too, probably unlike the author intended. The other two solutions are more work, especially because the half-volatility wouldn't be reflected in the objects type. Independend of what we decide for the middle-end, I'd like to hear the opinion of the C frontend people about this issue.
Would it make sense to make the statement volatile even if only some subcomponents (or all subcomponents) are volatile? I like (2); if I understand it correctly, in this case vv1 and vv2 would not be volatile, but you'd still have vv1 ={v} vv2; in the GIMPLE source. It should be possible to use a bit on {ARRAY,RECORD,UNION,QUAL_UNION}_TYPE to cache this, e.g. #define TYPE_HAS_VOLATILE_PARTS(T) \ (AGGREGATE_TYPE_P (T) \ ? TYPE_UNSIGNED (T) \ : TYPE_VOLATILE (T)) #define AGGREGATE_TYPE_CHECK(T) \ TREE_CHECK4(T, ARRAY_TYPE, RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE) #define SET_TYPE_HAS_VOLATILE_PARTS(T, V) \ (TYPE_UNSIGNED (AGGREGATE_TYPE_CHECK (T)) = (V)) Separately, expand would of course need to be taught about expanding accesses to volatile subcomponents as mem/v. If this approach was feasible, it would have the advantage of splitting the task in two parts, one for GIMPLE (including possibly the verifier) and one for expand.
(In reply to comment #8) > Would it make sense to make the statement volatile even if only some > subcomponents (or all subcomponents) are volatile? > > I like (2); if I understand it correctly, in this case vv1 and vv2 would not be > volatile, but you'd still have > > vv1 ={v} vv2; > > in the GIMPLE source. It should be possible to use a bit on > {ARRAY,RECORD,UNION,QUAL_UNION}_TYPE to cache this, e.g. > > #define TYPE_HAS_VOLATILE_PARTS(T) \ > (AGGREGATE_TYPE_P (T) \ > ? TYPE_UNSIGNED (T) \ > : TYPE_VOLATILE (T)) > > #define AGGREGATE_TYPE_CHECK(T) \ > TREE_CHECK4(T, ARRAY_TYPE, RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE) > > #define SET_TYPE_HAS_VOLATILE_PARTS(T, V) \ > (TYPE_UNSIGNED (AGGREGATE_TYPE_CHECK (T)) = (V)) > > Separately, expand would of course need to be taught about expanding accesses > to volatile subcomponents as mem/v. If this approach was feasible, it would > have the advantage of splitting the task in two parts, one for GIMPLE > (including possibly the verifier) and one for expand. If we want to treat vv1 = vv2 as volatile then the frontends can now simply emit MEM_REF <&vv1> = MEM_REF <&vv2> with TREE_THIS_VOLATILE set and things should work. That leaves it up to the frontend on how to deal with this. The much harder question is how to expand vv1 = vv2 "correctly". Thus, we need to define what happens and document it. Also consider memcpy (&vv1, &vv2) and eventually the compiler optimizing that to vv1 = vv2 (note the lack of {v} here).
One idea we had was that this is all frontends business anyway, and hence it should (if it so desires) simply create volatile MEM_REFs for references to half-volatile objects. That alone would result in the copy statement being marked volatile, and would also (I guess, haven't checked) do the right thing in expand. So, if we (the frontend) decide that accesses to objects containing volatile subobjects should itself be regarded as volatile, then generating the right kind of MEM_REF would already provide that.
On Mon, 18 Oct 2010, rguenth at gcc dot gnu.org wrote: > Also consider memcpy (&vv1, &vv2) and eventually the compiler optimizing > that to vv1 = vv2 (note the lack of {v} here). Using memcpy when any part of the source or destination is an object defined with volatile type has undefined behavior (at runtime).
It would be nice if for struct a { char a,b,c,d; volatile int e; }; struct a v1, v2; ... v1 = v2; the compiler emitted only _two_ memory accesses, one for a/b/c/d and one for e. I'm not sure a MEM_REF(&vv1) = MEM_REF(&vv2) with volatile arguments would achieve this. So, even though it's just an optimization, it would be better if the IR was designed to allow it. Would this be the case if the front-end emitted a volatile MODIFY_EXPR?
GCC 4.5.2 is being released, adjusting target milestone.
Do we want at least the patch properly merging the volatile bits in the scheduler for 4.6? Or is this better be s plain ICE instead of a silent miscompile?
Just to remind the middle-end folks that stage 1 is a fine time for deciding on middle-end volatile semantics, before we forget about this for another 6 months :)
GCC 4.5.3 is being released, adjusting target milestone.
By the way I think we could get cases where the user wrote volatile in one case and non-volatile in another so fixing up the merging is still a good idea.
(In reply to comment #17) > By the way I think we could get cases where the user wrote volatile in one case > and non-volatile in another so fixing up the merging is still a good idea. Sure, but the comment 14 still applies -- I think that unless the middle-end will have the proper volatile semantics, it will be wrong to hide this issue in the scheduler, so I'm leaving this as it is for now.
It seems to me that volatile reads/writes should get their own gimple statements, not be part of a larger block move. So instead of vv1 = vv2; we should have vv1.a ={v} vv2.a; vv1.b ={v} vv2.b; I agree with Paolo's comment in #12 that we want to copy the non-volatile parts as a block when possible. It seems like breaking a simple struct assignment into these separate statements would be best done in the gimplifier so that front ends don't need to get this right independently. Out of curiousity, what is the use case for a non-volatile struct object with volatile members?
I can think of two use-cases from threaded environment: - using the volatile member as a semaphore for the structure - when one needs to assure some data will be written in certain order (eg. first write data, then the 'data valid' flag), while other members of the structure don't need to be volatile (data used by only one thread, or data that are only read; marking the whole struct volatile would prevent some optimisations) Other cases: - when debugging and you want to prevent optimisations of certain variable - when the structure is allocated on a memory-mapped IO address, and only some parts of that IO need to be marked as volatile - in OOP, where the variable is a member, and would otherwise be a global volatile variable
(In reply to comment #19) > It seems to me that volatile reads/writes should get their own gimple > statements, not be part of a larger block move. So instead of > > vv1 = vv2; > > we should have > > vv1.a ={v} vv2.a; > vv1.b ={v} vv2.b; > > I agree with Paolo's comment in #12 that we want to copy the non-volatile parts > as a block when possible. It seems like breaking a simple struct assignment > into these separate statements would be best done in the gimplifier so that > front ends don't need to get this right independently. > > Out of curiousity, what is the use case for a non-volatile struct object with > volatile members? There is no valid use-case for this. So I think we should just declare this issue a non-issue (middle-end wise). If the C or C++ standards say that vv1 = vv2 should behave as if the copy was elementwise then the frontends need changing. Certainly not the gimplifier - that's not the kitchen-sink for things you don't want to properly describe in GENERIC to the middle-end ;)
> If the C or C++ standards say that vv1 = vv2 should behave as if the copy was > elementwise then the frontends need changing. Certainly not the gimplifier - > that's not the kitchen-sink for things you don't want to properly describe in > GENERIC to the middle-end ;) Yes, I think that we wouldn't want this in Ada for example.
The 4.5 branch is being closed, adjusting target milestone.
This ICE has started to appear for target POWER when compiling testsuite/gcc.dg/tree-prof/pr50907.c $ /tmp/20130104/gcc/xgcc -B/tmp/20130104/gcc/ /nasfarm/dje/src/src/gcc/testsuite/gcc.dg/tree-prof/pr50907.c -fno-diagnostics-show-caret -O -freorder-blocks-and-partition -fschedule-insns -fselective-scheduling -fpic -pthread -fprofile-generate -D_PROFILE_GENERATE -lm -o /tmp/20130104/gcc/testsuite/gcc/pr50907.x01 /nasfarm/dje/src/src/gcc/testsuite/gcc.dg/tree-prof/pr45354.c: In function 'test_ifelse2': /nasfarm/dje/src/src/gcc/testsuite/gcc.dg/tree-prof/pr45354.c:23:1: internal compiler error: in move_op_ascend, at sel-sched.c:6153
(In reply to comment #24) > This ICE has started to appear for target POWER when compiling > testsuite/gcc.dg/tree-prof/pr50907.c > That testcase (or pr45354.c which it includes) doesn't contain any 'volatile' keyword, so it is likely a different bug. I would recommend you opening a separate PR for that. This PR is about treating 'volatile' on struct members.
Created attachment 29539 [details] scheduler patch I'm testing the attached patch to fix the ICE in the scheduler. Then the issue/non-issue of the front-end of not presenting the proper volatile bits to the middle-end can be discussed separately.
Author: abel Date: Wed Feb 27 08:56:08 2013 New Revision: 196308 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196308 Log: PR middle-end/45472 gcc/ * sel-sched-ir.c (merge_expr): Also change vinsn of merged expr when the may_trap_p bit of the exprs being merged differs. Reorder tests for speculativeness in the logical and operator. testsuite/ * gcc.dg/45472.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr45472.c Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched-ir.c trunk/gcc/testsuite/ChangeLog
Fixed on trunk.
Ported to 4.7 and 4.6, though no bugzilla commit made. Do I close the bug or do we want to fix the front-end to produce the proper volatile bits?
GCC 4.6.4 has been released and the branch has been closed.
Fixed for 4.8.0