This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, RFC] PR 55415 : Pessimistic misalignment from eipa_sra pass
On Tue, Nov 27, 2012 at 9:37 PM, Martin Jambor <mjambor@suse.cz> wrote:
> Hi,
>
> On Tue, Nov 27, 2012 at 02:02:42PM +0100, Richard Biener wrote:
>> On Wed, Nov 21, 2012 at 5:58 PM, Martin Jambor <mjambor@suse.cz> wrote:
>> > Hi,
>> >
>> > On Tue, Nov 20, 2012 at 09:24:20AM -0800, Richard Henderson wrote:
>> >> The get_pointer_alignment function can indicate that it does not know
>> >> what the alignment should be, and it always fills in worst-case values
>> >> for that case. We should not use these worst-case values to "optimize"
>> >> the interface of a function.
>> >>
>> >> At minimum I think something like the following would be good. But
>> >> I'm unsure why we would *ever* want to lower the alignment at a function
>> >> interface. It seems to me that we'd simply want the caller to handle
>> >> copying the data to an aligned location?
>> >>
>> >> What was the use case of this code in the first place?
>> >
>> > PR 52402, and not surprisingly, that testcase fails on x86_64-linux
>> > with your patch. Furthermore, misalignment due to MEM_REF offset has
>> > to be accounted for whether we know the alignment of base or not.
>> >
>> > I was hoping that we could do something along the lines of
>> > build_ref_for_offset tree-sra.c but that would not work for cases like
>> > the one from PR 52890, comment 7 when there is no dereference in the
>> > caller, we tranform:
>> >
>> > D.2537 = &this->surfaces;
>> > sPtr.0 = ggTrain<mrSurface*>::_ZNK7ggTrainIP9mrSurfaceEixEi.isra.0 (D.2537, 1);
>> >
>> > into
>> >
>> > D.2537 = &this->surfaces;
>> > D.2554 = MEM[(const struct ggTrain *)D.2537];
>> > sPtr.0 = ggTrain<mrSurface*>::_ZNK7ggTrainIP9mrSurfaceEixEi.isra.0 (D.2537, 1);
>> >
>> > At the moment I hope I'll be able to:
>> >
>> > 1) Bring back tree_non_aligned_mem_p from 4.6 to be used in
>> > access_precludes_ipa_sra_p. This way, any knowingly misaligned
>> > load in the callee should will not be transferred to the caller, if
>> > there is none.
>> >
>> > 2) In ipa_modify_call_arguments, be optimistic in like in your patch
>> > except for the case when we are looking at a dereference (i.e. we
>> > are turning a BLK dereference into a smaller scalar type one). In
>> > that case, use its alignment like in build_ref_for_offset.
>> >
>> > But I'd like to think about this a bit more. I believe we should ask
>> > for Richi's approval when he comes back and recovers anyway. But this
>> > is now my highest priority (finally).
>>
>> The issue here is that IPA SRA, when seeing
>>
>> foo (T *addr)
>> {
>> *addr;
>> }
>>
>> foo (&mem);
>>
>> wanting to transform it to
>>
>> foo' (T value)
>> {
>> value;
>> }
>>
>> value = *(T *)mem;
>> foo (value);
>>
>> has to use the _same_ alignment for the access to mem as it was used
>> inside foo. That's because of the fact that alignment information in GIMPLE
>> is solely present at memory accesses and _not_ in any way associated
>> with pointer types (as opposed to more strict rules imposed by some languages
>> such as C, often enough violated by users, *(char *)(int *)p is an access
>> aligned to at least 4 bytes in C).
>>
>> IPA SRA can use bigger alignment if it knows that is safe (thus
>> get_pointer_alignment returns something larger than what the actual
>> access in foo was using). What IPA SRA at the moment cannot do
>> is "propagate" larger alignment used in foo to the call site (AFAIK).
>> Thus, technically IPA SRA can use MAX (get_pointer_alignment (call site),
>> get_object_alignment (original dereference site)).
>>
>> For fixing pessimizations caused by IPA SRA I suggest to simply punt
>> (not do the transform) if get_pointer_alignment_1 returns false for the
>> actual call argument. Or implement the propagation.
>
> the patch below punts if get_object_alignment on any dereference in
> the callee returns something smaller than the alignment of the type of
> the would-be replacement (and the type of the load we would introduce
> in the caller). This is essentially an implicit propagation of
> alignment from callees to callers. I have added a new testcase that
> checks this which, when not handled properly, fails on sparc64 and
> produces wrong code on arm (which I however only checked by looking at
> cross-compiler assembly output).
>
> Nevertheless, there is another case that must be taken care of (you
> already added a x86_64 testcase: gcc.dg/torture/pr52402.c) when there
> already is a dereference in the caller but it is a BLK-mode one and
> IPA-SRA changes it to a scalar load - this happens when an aggregate
> by-value parameter is turned into a scalar one. In that case, we
> should look at the dereference in the caller and derive alignment from
> there.
Well, you can derive alignment from all memory accesses that are
always executed on the path the call stmt is on and take the maximum
(of course paying attention to properly translate what you get to the
alignment of the base object at offset zero).
Such analysis of course doesn't fit the flow-insensitiveness of SRA
very well ...
> I do not do MAX, I assume that when get_pointer_alignment returns
> true, it can be trusted. If it returns some smaller alignment than
> the alignment of the type (propagated from the callee), then I suppose
> there is an alignment attribute missing in the callee or some such
> mistake in the original program.
Well, it can return true if you for example have
ptr = ptr & ~3;
... *ptr
but then some other access may tell you that ptr was even more aligned.
But yes, the way you do it should be conservatively correct at least.
> So far I have successfully bootstrapped and tested this on
> x86_64-linux and have run parts of the testsuite on sparc64-linux and
> hppa-linux. Full bootstrap on sparc64-linux and ppc64-linux are
> underway.
>
> I'm going to add a testcase for PR 55448 that will scan x86_64
> assembly output.
>
> What do you think?
See below - there is some confusion on my side regarding what memory
access alignment you query and why you have the two prev_base cases
(you constrain on TYPE_ALIGN, so you should always use TYPE_ALIGN?)
Richard.
> Thanks,
>
> Martin
>
>
> 2012-11-27 Martin Jambor <mjambor@suse.cz>
>
> PR middle-end/52890
> PR tree-optimization/55415
> PR tree-optimization/54386
> PR target/55448
> * ipa-prop.c (ipa_modify_call_arguments): Be optimistic when
> get_pointer_alignment_1 returns false and the base was not a
> dereference.
> * tree-sra.c (access_precludes_ipa_sra_p): New parameter req_align,
> added check for required alignment. Update the user.
>
> * testsuite/gcc.dg/ipa/ipa-sra-7.c: New test.
>
> *** /tmp/5XB8rd_ipa-prop.c Tue Nov 27 21:34:54 2012
> --- gcc/ipa-prop.c Tue Nov 27 17:04:20 2012
> *************** ipa_modify_call_arguments (struct cgraph
> *** 2888,2893 ****
> --- 2888,2894 ----
> {
> tree expr, base, off;
> location_t loc;
> + tree prev_base = NULL_TREE;
>
> /* We create a new parameter out of the value of the old one, we can
> do the following kind of transformations:
> *************** ipa_modify_call_arguments (struct cgraph
> *** 2919,2925 ****
> else
> {
> HOST_WIDE_INT base_offset;
> - tree prev_base;
>
> if (TREE_CODE (base) == ADDR_EXPR)
> base = TREE_OPERAND (base, 0);
> --- 2920,2925 ----
> *************** ipa_modify_call_arguments (struct cgraph
> *** 2956,2962 ****
> unsigned int align;
> unsigned HOST_WIDE_INT misalign;
>
> ! get_pointer_alignment_1 (base, &align, &misalign);
> misalign += (tree_to_double_int (off)
> .sext (TYPE_PRECISION (TREE_TYPE (off))).low
> * BITS_PER_UNIT);
> --- 2956,2973 ----
> unsigned int align;
> unsigned HOST_WIDE_INT misalign;
>
> ! if (!get_pointer_alignment_1 (base, &align, &misalign))
> ! {
> ! if (prev_base
> ! && (TREE_CODE (prev_base) == MEM_REF
> ! || TREE_CODE (prev_base) == TARGET_MEM_REF))
> ! {
> ! gcc_assert (misalign == 0);
> ! align = TYPE_ALIGN (TREE_TYPE (prev_base));
> ! }
> ! else
> ! align = TYPE_ALIGN (type);
I'm quite sure that misalign is wrong after this.
What memory access is 'base', what memory access is 'prev_base'
(prev_base from the original memory reference in the callee that is
going to be replaced? base the one that IPA SRA "constructed"
based on the pointer passed at the call site?)
> ! }
> misalign += (tree_to_double_int (off)
> .sext (TYPE_PRECISION (TREE_TYPE (off))).low
> * BITS_PER_UNIT);
> *** /dev/null Thu Oct 25 13:49:30 2012
> --- gcc/testsuite/gcc.dg/ipa/ipa-sra-7.c Tue Nov 27 18:48:45 2012
> ***************
> *** 0 ****
> --- 1,42 ----
> + /* { dg-do run } */
> + /* { dg-options "-O3" } */
> +
> + typedef unsigned int myint __attribute__((aligned(1)));
> +
> + typedef struct __attribute__((packed)) S {
> + unsigned a, b, c;
> + } SS;
> +
> + typedef SS __attribute__((aligned(1))) SSS;
> +
> +
> + static unsigned int __attribute__ ((noinline))
> + get_a (SSS *p)
> + {
> + return p->a;
> + };
> +
> + static int __attribute__ ((noinline, noclone))
> + foo (SS *p)
> + {
> + int r = (int) get_a(p) + 2;
> + return r;
> + }
> +
> + char buf[512];
> +
> + static SSS * __attribute__ ((noinline, noclone))
> + get_sss (void)
> + {
> + return (SSS *)(buf + 1);
> + }
> +
> +
> + int
> + main(int argc, char *argv[])
> + {
> + SSS *p = get_sss();
> + if (foo(p) != 2)
> + __builtin_abort ();
> + return 0;
> + }
> *** /tmp/Hp0Wyc_tree-sra.c Tue Nov 27 21:34:54 2012
> --- gcc/tree-sra.c Tue Nov 27 21:28:53 2012
> *************** unmodified_by_ref_scalar_representative
> *** 3891,3902 ****
> return repr;
> }
>
> ! /* Return true iff this access precludes IPA-SRA of the parameter it is
> ! associated with. */
>
> static bool
> ! access_precludes_ipa_sra_p (struct access *access)
> {
> /* Avoid issues such as the second simple testcase in PR 42025. The problem
> is incompatible assign in a call statement (and possibly even in asm
> statements). This can be relaxed by using a new temporary but only for
> --- 3891,3903 ----
> return repr;
> }
>
> ! /* Return true iff this ACCESS precludes IPA-SRA of the parameter it is
> ! associated with. REQ_ALIGN is the minimum required alignment. */
>
> static bool
> ! access_precludes_ipa_sra_p (struct access *access, unsigned int req_align)
> {
> + unsigned int exp_align;
> /* Avoid issues such as the second simple testcase in PR 42025. The problem
> is incompatible assign in a call statement (and possibly even in asm
> statements). This can be relaxed by using a new temporary but only for
> *************** access_precludes_ipa_sra_p (struct acces
> *** 3908,3913 ****
> --- 3909,3918 ----
> || gimple_code (access->stmt) == GIMPLE_ASM))
> return true;
>
> + exp_align = get_object_alignment (access->expr);
Is access->expr from the callee or the caller?
> + if (exp_align < req_align)
> + return true;
> +
> return false;
> }
>
> *************** splice_param_accesses (tree parm, bool *
> *** 3943,3949 ****
> tree a1_alias_type;
> access = (*access_vec)[i];
> modification = access->write;
> ! if (access_precludes_ipa_sra_p (access))
> return NULL;
> a1_alias_type = reference_alias_ptr_type (access->expr);
>
> --- 3948,3954 ----
> tree a1_alias_type;
> access = (*access_vec)[i];
> modification = access->write;
> ! if (access_precludes_ipa_sra_p (access, TYPE_ALIGN (access->type)))
access->type is not the type of the base object, right? Which means it
is not correct here - the alignment of the access is that what
get_object_alignment returns, which looks at the base as to whether to
lower the alignment compared to TYPE_ALIGN of the access type itself.
This is from inside the callee, right? So what you want to check here is
that if you end up optimistically using the alignment of the access type
that this is at most equal but not larger than the alignment of the access.
> return NULL;
> a1_alias_type = reference_alias_ptr_type (access->expr);
>
> *************** splice_param_accesses (tree parm, bool *
> *** 3966,3972 ****
> else if (ac2->size != access->size)
> return NULL;
>
> ! if (access_precludes_ipa_sra_p (ac2)
> || (ac2->type != access->type
> && (TREE_ADDRESSABLE (ac2->type)
> || TREE_ADDRESSABLE (access->type)))
> --- 3971,3977 ----
> else if (ac2->size != access->size)
> return NULL;
>
> ! if (access_precludes_ipa_sra_p (ac2, TYPE_ALIGN (access->type))
> || (ac2->type != access->type
> && (TREE_ADDRESSABLE (ac2->type)
> || TREE_ADDRESSABLE (access->type)))