This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3

From: Richard Biener <rguenther at suse dot de>
To: Eric Botcazou <ebotcazou at adacore dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Mon, 2 May 2016 11:41:52 +0200 (CEST)
Subject: Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
Authentication-results: sourceware.org; auth=none
References: <alpine dot LSU dot 2 dot 11 dot 1602181335560 dot 1392 at t29 dot fhfr dot qr> <3564626 dot L9kKSTKFH1 at polaris>

On Mon, 2 May 2016, Eric Botcazou wrote:

> > The following experiment resulted from looking at making
> > array_ref_low_bound and array_ref_element_size non-mutating.  Again
> > I wondered why we do this strange scaling by offset/element alignment.
> 
> The idea is to expose the alignment factor to the RTL expander:
> 
> 	tree tem
> 	  = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
> 				 &unsignedp, &reversep, &volatilep, true);
> 
> [...]
> 
> 	    rtx offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode,
> 					  EXPAND_SUM);
> 
> [...]
> 
> 	    op0 = offset_address (op0, offset_rtx,
> 				  highest_pow2_factor (offset));
> 
> With the scaling, offset is something like _69 * 4 so highest_pow2_factor can 
> see the factor and passes it down to offset_address:
> 
> (gdb) p debug_rtx(op0)
> (mem/c:SI (plus:SI (reg/f:SI 193)
>         (reg:SI 194)) [3 *s.16_63 S4 A32])
> 
> With your patch in the same situation:
> 
> (gdb) p debug_rtx(op0)
> (mem/c:SI (plus:SI (reg/f:SI 139)
>         (reg:SI 116 [ _33 ])) [3 *s.16_63 S4 A8])
> 
> On strict-alignment targets, this makes a big difference, e.g. SPARC:
> 
> 	ld	[%i4+%i5], %i0
> 
> vs
> 
> 	ldub	[%i5+%i4], %g1
> 	sll	%g1, 24, %g1
> 	add	%i5, %i4, %i5
> 	ldub	[%i5+1], %i0
> 	sll	%i0, 16, %i0
> 	or	%i0, %g1, %i0
> 	ldub	[%i5+2], %g1
> 	sll	%g1, 8, %g1
> 	or	%g1, %i0, %g1
> 	ldub	[%i5+3], %i0
> 	or	%i0, %g1, %i0
> 
> 
> Now this is mitigated by a couple of things:
> 
>   1. the above pessimization only happens on the RHS; on the LHS, the expander 
> calls highest_pow2_factor_for_target instead of highest_pow2_factor and the 
> former takes into account the type's alignment thanks to the MAX:
> 
> /* Similar, except that the alignment requirements of TARGET are
>    taken into account.  Assume it is at least as aligned as its
>    type, unless it is a COMPONENT_REF in which case the layout of
>    the structure gives the alignment.  */
> 
> static unsigned HOST_WIDE_INT
> highest_pow2_factor_for_target (const_tree target, const_tree exp)
> {
>   unsigned HOST_WIDE_INT talign = target_align (target) / BITS_PER_UNIT;
>   unsigned HOST_WIDE_INT factor = highest_pow2_factor (exp);
> 
>   return MAX (factor, talign);
> }
> 
>   2. highest_pow2_factor can be rescued by the set_nonzero_bits machinery of 
> the SSA CCP pass because it calls tree_ctz.  The above example was compiled 
> with -O -fno-tree-ccp on SPARC; at -O, the code isn't pessimized.
> 
> > So - the following patch gets rid of that scaling.  For a "simple"
> > C testcase
> > 
> > void bar (void *);
> > void foo (int n)
> > {
> >   struct S { struct R { int b[n]; } a[2]; int k; } s;
> >   s.k = 1;
> >   s.a[1].b[7] = 3;
> >   bar (&s);
> > }
> 
> This only exposes the LHS case, here's a more complete testcase:
> 
> void bar (void *);
> 
> int foo (int n)
> {
>   struct S { struct R { char b[n]; } a[2]; int k; } s;
>   s.k = 1;
>   s.a[1].b[7] = 3;
>   bar (&s);
>   return s.k;
> }

Ok.  Note that on x86_64 at least SLSR wrecks the component-ref case:

@@ -30,10 +34,14 @@
   _9 = _8 + 4;
   s.1_11 = __builtin_alloca_with_align (_9, 32);
   _12 = _7 >> 2;
-  s.1_11->k{off: _12 * 4} = 1;
+  _18 = _12 * 4;
+  _19 = s.1_11 + _18;
+  MEM[(struct S *)_19] = 1;
   s.1_11->a[1]{lb: 0 sz: _5}.b[7] = 3;
   bar (s.1_11);
-  _16 = s.1_11->k{off: _12 * 4};
+  _20 = _12 * 4;
+  _21 = s.1_11 + _20;
+  _16 = MEM[(struct S *)_21];
   __builtin_stack_restore (saved_stack.3_3);
   return _16;

It seems to me that the issue in the end is that where we compute
alignment from is the pieces gathered by get_inner_reference
instead of computing it alongside of that information in
get_inner_reference, taking advantage of DECL_OFFSET_ALIGN
and the array element type alignment there.  This would be
another opportunity to merge get_inner_reference and
get_object_alignment_2 (or have a common worker).

That we expose the exact division in the IL has unfortunate side-effects
like above.

Do I understand you correctly that without using -fno-tree-ccp there
are currently no regressions?  What about -O0 then?  The code
generated by -O0 on x86_64 currently is quite horrible of course,
so maybe we don't care too much...  I think -Og disables CCPs
bit-tracking though.

I see that the constant offset parts are only sometimes folded
into op0 before calling offset_address with the variable offset parts.
Now with get_object_alignment_2 computing alignment and misalignment
one could split out the misaligning offset from bitpos in some way
to have op0 always be aligned.

Sth like

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 235706)
+++ gcc/expr.c  (working copy)
@@ -10334,6 +10334,9 @@ expand_expr_real_1 (tree exp, rtx target
                offset_rtx = convert_to_mode (address_mode, offset_rtx, 
0);
              }
 
+           get_object_alignment_2 (exp, &align, &misalign, false);
+           bitpos -= misalign;
+
            /* See the comment in expand_assignment for the rationale.  */
            if (mode1 != VOIDmode
                && bitpos != 0
@@ -10348,6 +10351,7 @@ expand_expr_real_1 (tree exp, rtx target
 
            op0 = offset_address (op0, offset_rtx,
                                  highest_pow2_factor (offset));
+           set_mem_align (op0, align);
          }
 
        /* If OFFSET is making OP0 more aligned than BIGGEST_ALIGNMENT,

which of course still would need adjustments to get_object_alignment_2
and get_innner_reference to recover alignment info by looking at
DECL_OFFSET_ALIGN and friends.

For the moment (as my original motivation was to get rid of the
requirement to call component_ref_field_offset and friends
from GIMPLE parts of the compiler to make them non-tree-mutating
but that has shown to be impossible or rather difficult)
I'm leaving things as-is.

Thanks,
Richard.

Follow-Ups:
- Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  - From: Eric Botcazou

References:
- Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3
  - From: Eric Botcazou

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]