This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PR 83141] Prevent SRA from removing type changing assignment (fwd)
- From: Richard Biener <rguenther at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Martin Jambor <mjambor at suse dot cz>
- Date: Wed, 6 Dec 2017 15:07:04 +0100 (CET)
- Subject: Re: [PR 83141] Prevent SRA from removing type changing assignment (fwd)
- Authentication-results: sourceware.org; auth=none
Now also to the list...
---------- Forwarded message ----------
Date: Wed, 6 Dec 2017 14:48:00 +0100 (CET)
From: Richard Biener <rguenther@suse.de>
To: Martin Jambor <mjambor@suse.cz>
Subject: Re: [PR 83141] Prevent SRA from removing type changing assignment
On Wed, 6 Dec 2017, Martin Jambor wrote:
> Hi,
>
> On Tue, Dec 05 2017, Richard Biener wrote:
> > On Tue, 5 Dec 2017, Martin Jambor wrote:
> >
> >> On Tue, Dec 05 2017, Martin Jambor wrote:
> >> > On Tue, Dec 05 2017, Martin Jambor wrote:
> >> > Hi,
> >> >
> >> >> Hi,
> >> >>
> >> >> this is a followup to Richi's
> >> >> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
> >> >> 83141. The basic idea is simple, be just as conservative about type
> >> >> changing MEM_REFs as we are about actual VCEs.
> >> >>
> >> >> I have checked how that would affect compilation of SPEC 2006 and (non
> >> >> LTO) Mozilla Firefox and am happy to report that the difference was
> >> >> tiny. However, I had to make the test less strict, otherwise testcase
> >> >> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
> >> >> and expects us to track values accross:
> >> >>
> >> >> int a[] = { 1, 2, 3 };
> >> >> /* ... */
> >> >> __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
> >> >> /* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
> >> >> /* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
> >> >> /* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
> >> >>
> >> >> SRA is able to load replacement of a[0] directly from the temporary
> >> >> array which is apparently necessary to generate proper debug info. I
> >> >> have therefore allowed the current transformation to go forward if the
> >> >> source does not contain any padding or if it is a read-only declaration.
> >> >
> >> > Ah, the read-only test is of course bogus, it was a last minute addition
> >> > when I was apparently already too tired to think it through. Please
> >> > disregard that line in the patch (it has passed bootstrap and testing
> >> > without it).
> >> >
> >> > Sorry for the noise,
> >> >
> >> > Martin
> >> >
> >>
> >> And for the record, below is the actual patch, after a fresh round of
> >> re-testing to double check I did not mess up anything else. As before,
> >> I'd like to ask for review, especially of the type_contains_padding_p
> >> predicate and then would like to commit it to trunk.
> >
> > Comments below...
> >
> >> Thanks,
> >>
> >> Martin
> >>
> >>
> >> 2017-12-05 Martin Jambor <mjambor@suse.cz>
> >>
> >> PR tree-optimization/83141
> >> * tree-sra.c (type_contains_padding_p): New function.
> >> (contains_vce_or_bfcref_p): Move up in the file, also test for
> >> MEM_REFs implicitely changing types with padding. Remove inline
> >> keyword.
> >> (build_accesses_from_assign): Check contains_vce_or_bfcref_p
> >> before setting bit in should_scalarize_away_bitmap.
> >>
> >> testsuite/
> >> * gcc.dg/tree-ssa/pr83141.c: New test.
> >> ---
> >> gcc/testsuite/gcc.dg/tree-ssa/pr83141.c | 31 +++++++++
> >> gcc/tree-sra.c | 115 ++++++++++++++++++++++++++------
> >> 2 files changed, 127 insertions(+), 19 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> >> new file mode 100644
> >> index 00000000000..86caf66025b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> >> @@ -0,0 +1,31 @@
> >> +/* { dg-do run } */
> >> +/* { dg-options "-O -fdump-tree-esra-details" } */
> >> +
> >> +volatile short vs;
> >> +volatile long vl;
> >> +
> >> +struct A { short s; long i; long j; };
> >> +struct A a, b;
> >> +void foo ()
> >> +{
> >> + struct A c;
> >> + __builtin_memcpy (&c, &b, sizeof (struct A));
> >> + __builtin_memcpy (&a, &c, sizeof (struct A));
> >> +
> >> + vs = c.s;
> >> + vl = c.i;
> >> + vl = c.j;
> >> +}
> >> +int main()
> >> +{
> >> + __builtin_memset (&b, 0, sizeof (struct A));
> >> + b.s = 1;
> >> + __builtin_memcpy ((char *)&b+2, &b, 2);
> >> + foo ();
> >> + __builtin_memcpy (&a, (char *)&a+2, 2);
> >> + if (a.s != 1)
> >> + __builtin_abort ();
> >> + return 0;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" } } */
> >> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> >> index 866cff0edb0..df9f10f83b6 100644
> >> --- a/gcc/tree-sra.c
> >> +++ b/gcc/tree-sra.c
> >> @@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
> >> return false;
> >> }
> >>
> >> +/* Return false if we can determine that TYPE has no padding, otherwise return
> >> + true. */
> >> +
> >> +static bool
> >> +type_contains_padding_p (tree type)
> >> +{
> >> + if (is_gimple_reg_type (type))
> >> + return false;
> >> +
> >> + if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
> >> + return true;
> >> +
> >> + switch (TREE_CODE (type))
> >> + {
> >> + case RECORD_TYPE:
> >> + {
> >> + unsigned HOST_WIDE_INT pos = 0;
> >> + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> >> + if (TREE_CODE (fld) == FIELD_DECL)
> >> + {
> >> + tree ft = TREE_TYPE (fld);
> >> +
> >> + if (DECL_BIT_FIELD (fld)
> >> + || DECL_PADDING_P (fld)
> >> + || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
> >> + return true;
> >> +
> >> + tree t = bit_position (fld);
> >> + if (!tree_fits_uhwi_p (t))
> >> + return true;
> >> + unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
> >> + if (pos != bp)
> >> + return true;
> >> +
> >> + pos += tree_to_uhwi (TYPE_SIZE (ft));
> >> + if (type_contains_padding_p (ft))
> >> + return true;
> >> + }
> >> +
> >> + if (pos != tree_to_uhwi (TYPE_SIZE (type)))
> >> + return true;
> >> +
> >> + return false;
> >> + }
> >> +
> >> + case ARRAY_TYPE:
> >> + return type_contains_padding_p (TYPE_SIZE (type));
> >> +
> >> + default:
> >> + return true;
> >> + }
> >> + gcc_unreachable ();
> >> +}
> >
> > The possibly repeated walk over all fields and subfields of an aggregate
> > type looks expensive - some caching on the main variant might be
> > advisable to me.
>
> It almost never happens but I guess you are right. But see below.
>
> >
> >> +/* Return true if REF contains a MEM_REF that performs type conversion from a
> >> + type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> >> + bit-field field declaration. */
> >> +
> >> +static bool
> >> +contains_vce_or_bfcref_p (const_tree ref)
> >> +{
> >> + while (true)
> >> + {
> >> + if (TREE_CODE (ref) == MEM_REF)
> >> + {
> >> + tree op0 = TREE_OPERAND (ref, 0);
> >> + if (TREE_CODE (op0) == ADDR_EXPR)
> >> + {
> >> + tree mem = TREE_OPERAND (op0, 0);
> >> + if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
> >> + != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
> >> + && type_contains_padding_p (TREE_TYPE (mem)))
> >> + return true;
> >> +
> >> + ref = mem;
> >> + continue;
> >> + }
> >> + else
> >> + return false;
> >> + }
> >> +
> >> + if (!handled_component_p (ref))
> >> + return false;
> >> +
> >> + if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> >> + || (TREE_CODE (ref) == COMPONENT_REF
> >> + && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> >> + return true;
> >> + ref = TREE_OPERAND (ref, 0);
> >> + }
> >
> > It is best to retain the old code and simply append
> >
> > if (TREE_CODE (ref) == MEM_REF)
> >
> > after the while (handled_component_p ()) loop. A MEM_REF can only
> > appear in innermost position.
>
> I hoped so but could not find any such check quickly in the tree-cfg.c
> verifier.
>
> >
> > I'm not sure that we really want to retain SRAing structures
> > based on their fields when we see a VCEd MEM_REF accessing them.
> > This might for example do floating-point accesses on data that
> > isn't floating-point and on some architectures may raise exceptions
> > (IA64 comes to my mind). The memcpy folding code explicitely disallows
> > float modes and also BOOLEAN_TYPE and ENUMERAL_TYPE becuase they
> > might not match their precision:
> >
> > /* Make sure we are not copying using a floating-point mode or
> > a type whose size possibly does not match its precision. */
> > if (FLOAT_MODE_P (TYPE_MODE (desttype))
> > || TREE_CODE (desttype) == BOOLEAN_TYPE
> > || TREE_CODE (desttype) == ENUMERAL_TYPE)
> > desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
> > if (FLOAT_MODE_P (TYPE_MODE (srctype))
> > || TREE_CODE (srctype) == BOOLEAN_TYPE
> > || TREE_CODE (srctype) == ENUMERAL_TYPE)
> > srctype = bitwise_type_for_mode (TYPE_MODE (srctype));
> >
> > so - what's the reason you think you need to retain SRAing memcpied
> > structs? Via total scalarization I mean - if we see the "real" accesses
> > besides the aggregate copy then we of course win.
>
> Two reasons. First, Jakub explicitely asked me to do it in
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02139.html. I will openly
> admit that at the time I was planning to silently ignore it and hope to
> get away with it because I shared your caution. (Now I see I forgot
> about the second request which I did not want to ignore and will adjust
> the testcase accordingly.)
>
> Second is the testcase I described in my previous email. When I saw
>
> FAIL: gcc.dg/guality/pr54970.c -O1 line 31 a[0] == 4
>
> At all optimization levels, I grumbled about Jakub being right again and
> duly decided to bite the bullet and do what he asked me to because it
> fixes the issue. But if you allow me to XFAIL the guality test, I will
> happily remove the whole padding detection, I don't really like it
> either.
>
> The debug information is apparently lost because a[0] is never used from
> that point on, as opposed to a[1] and a[2] for which the guality test
> still passes.
XFAILing that is fine I think.
> >
> > Do we check anywhere when seeing an aggregate copy that lhs and rhs
> > have matching TYPE_MAIN_VARIANT?
>
> No. I can do some due diligence measurements but I think it would not
> hurt to add it.
>
> > GIMPLE allows matching TYPE_CANONICAL
> > which can have mismatched fields (LTO, cross-language) or even any
> > type if TYPE_STRUCTURAL_EQUALITY. That is, what happens if we mark
> > two distinct enough decls for total scalarization that are copied
> > to each other? Do we end up re-materializing them if their
> > "sturcture" doesn't match when transforming the assignment?
>
> Basically yes, although we try to do the rematerialization only on the
> RHS or LHS. In more detail, if an assignment is not considered fragile
> by the condition
>
> if (modify_this_stmt
> || gimple_has_volatile_ops (stmt)
> || contains_vce_or_bfcref_p (rhs)
> || contains_vce_or_bfcref_p (lhs)
> || stmt_ends_bb_p (stmt))
>
> the following happens:
>
> 1) if each replacement of the LHS has a match on the RHS in terms of
> offset and size, scalar assignment between them is performed,
> possibly with a VIEW_CONVERT_EXPR if types do not match.
>
> 2) If for some LHS replacement that is not the case, then we flush all
> RHS replacements either to the LHS (hoping to make the original RHS
> redundant) if the RHS replacements contain all data there is in
> RHS, or to RHS otherwise, and load the LHS replacement from the LHS
> or RHS that we picked (really the same one, we do not rely on the
> original assignment to carry the data).
>
> 3) Same flushing happens when some part of LHS is not scalarized,
> unless we know it is never read, then we don't care.
>
> Generally, if we can prove that we do not loose any data, the original
> statement is deleted.
>
> But let me emphasize again that whenever correctness is the issue, the
> question whether an SRA recorded access comes from total scalarization
> or not is not important. Users accessing the data in some other part of
> the function can create them too. Users equipped with placement new can
> create all sorts of weird type accesses and because tree-sra is flow
> insensitive, they can then force scalarization to such replacements even
> at places where the data have wildly different types.
Yes, but SRA will never create loads or stores for the non-total
scalarization case it will only choose one (better non-FP if not
all accesses are FP - I think compare_access_positions guarantees that)
scalar register for each replacement, right?
Basically it will replace _all_ accesses of a memory piece with
a register instead, making that memory piece unused?
So for an aggregate assignment chain like
A.x = ...;
B = A;
C = B;
= ... C.y;
that is, B is only aggregate-accessed but A and C are also otherwise.
Here we can chose to elide B based on the accesses of A and B.
Richard.