This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions


Hi Joseph,

> What types exactly is this meant to apply to?  Floating-point?  Integer?  
> Mixtures of those?  (I'm guessing not mixtures, because those would be something other than "convert" here.)

Originally I had it for both Floating-point and Integer, but not a mix of the two.

> For integer types, it's not safe, in that if e.g. F is int and X is unsigned long long, you're changing from defined overflow to undefined overflow.

This is a good point, so I should limited it to floating point formats only.

> For floating-point types, using TYPE_PRECISION is suspect (it's not wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the machine modes in question.

Would restricting it to flag_unsafe_math_optimizations not be enough in this case? Since if it's only done for unsafe math
then you likely won't care about a small loss in precision anyway?

> Specifically, see convert.c:convert_to_real_1, the long comment starting "Sometimes this transformation is safe (cannot change results through affecting double rounding cases) and sometimes it is not.", and the associated code calling real_can_shorten_arithmetic.  I think that code in convert.c ought to apply to your case of half precision converted to float for arithmetic and then converted back to half precision afterwards.  (In the case where the excess precision configuration - which depends on TARGET_FP_F16INST for AArch64 - says to do arithmetic directly on half precision, anyway.)

Sorry I hadn't attached a test-case because I wanted to get some feedback before.

But my simple testcase is

#include <complex.h>

#define N 200

void foo (_Float16 complex a[N], _Float16 complex b[N], _Float16 complex *c)
{
  for (int x = 0; x < N; x++)
    c[x] = a[x] + b[x] * I;
}

A simplified version of one of the expressions in the gimple tree for this is

  _7 = IMAGPART_EXPR <*_4>;
  _13 = REALPART_EXPR <*_3>;
  _8 = (floatD.30) _7;
  _14 = (floatD.30) _13;
  _35 = _14 + _8;
  _19 = (_Float16D.33) _35;
  IMAGPART_EXPR <*_20> = _19;

note that the type of _4, _3 and _19 are that of half float.

  _4 = a_26(D) + _2;
  _3 = b_25(D) + _2;
  _20 = c_28(D) + _2;

  complex _Float16D.43 * a_26(D) = aD.3538;
  complex _Float16D.43 * b_25(D) = bD.3539;
  complex _Float16D.43 * c_28(D) = cD.3540;

If the code is SLP vectorized, then inside the tree (without my match.pd rule)
it will contain the casts, which is evident from the resulting vector code

	ldr	h0, [x6, x3]
	ldr	h2, [x1, x3]
	fcvt	s0, h0
	fcvt	s2, h2
	fadd	s0, s0, s2
	fcvt	h1, s1
	fcvt	h0, s0
	str	h1, [x2, x3]
	str	h0, [x4, x3]

The problem seems to be (as far as I can tell) that when a loop is present, a new
temporary is created to store the intermediate result, while it's creating the
destination variable with the dest + offset.  It can't do the cast and the computation
at the same time.

foo (complex _Float16 * a, complex _Float16 * b, complex _Float16 * c)
{
  complex _Float16 D_3548;
  complex _Float16 D_3549;
  complex float D_3550;

The type of this temporary seems to be the type of the inner computation, so in
this case float due to the 90 rotation of b in the C file.

This also has the effect that it seems to make the type of the imaginary and real
expressions all float

So when convert.c tries to convert them, all it sees are

 <imagpart_expr 0x7ffb19016c40
    type <real_type 0x7ffb18e362a0 float SF
        size <integer_cst 0x7ffb18e1df48 constant 32>
        unit-size <integer_cst 0x7ffb18e1df60 constant 4>
        align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffb18e362a0 precision:32
        pointer_to_this <pointer_type 0x7ffb18e36888>>
    side-effects
    arg:0 <save_expr 0x7ffb19016c00

Because the usage of this expression is to store it inside a complex float, then
convert and constant prop will no longer remove the casts.  The float is then
converted into half float again for storing.  Because of this all the invocations
of convert_to_real_1 exit at REAL_TYPE and essentially adds a NOP_EXPR doing no
conversions.

So it does seem like match.pd is the best place to do this since it won't care
about the temporaries.

Thanks,
Tamar

> -----Original Message-----
> From: Joseph Myers <joseph@codesourcery.com>
> Sent: Tuesday, November 13, 2018 00:49
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; James Greenhalgh
> <James.Greenhalgh@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; law@redhat.com; ian@airs.com;
> rguenther@suse.de
> Subject: Re: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away
> unneeded type casts in expressions
> 
> On Sun, 11 Nov 2018, Tamar Christina wrote:
> 
> > This patch adds a match.pd rule for stripping away the type converts
> > when you're converting to a type that has twice the precision of the
> > current type in the same class, doing a simple math operation on it
> > and converting back to the smaller type.
> 
> What types exactly is this meant to apply to?  Floating-point?  Integer?
> Mixtures of those?  (I'm guessing not mixtures, because those would be
> something other than "convert" here.)
> 
> For integer types, it's not safe, in that if e.g. F is int and X is unsigned long
> long, you're changing from defined overflow to undefined overflow.
> 
> For floating-point types, using TYPE_PRECISION is suspect (it's not
> wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the
> machine modes in question.
> 
> Specifically, see convert.c:convert_to_real_1, the long comment starting
> "Sometimes this transformation is safe (cannot change results through
> affecting double rounding cases) and sometimes it is not.", and the
> associated code calling real_can_shorten_arithmetic.  I think that code in
> convert.c ought to apply to your case of half precision converted to float for
> arithmetic and then converted back to half precision afterwards.  (In the case
> where the excess precision configuration - which depends on
> TARGET_FP_F16INST for AArch64 - says to do arithmetic directly on half
> precision, anyway.)
> 
> Now, there are still issues in that convert.c code in the decimal floating-point
> case (see bug 40503).  And I think match.pd is a much better place for this
> sort of thing than convert.c (and c-family/c-common.c:shorten_binary_op in
> the integer case).  But for this specific case of binary floating-point
> conversions, I think the logic in convert.c is what should be followed (but
> moved to match.pd if possible).
> 
> This patch is also lacking a testcase, which might show why the existing logic
> in convert.c isn't being applied in whatever case you're concerned with.
> 
> --
> Joseph S. Myers
> joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]