This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions
- From: Tamar Christina <Tamar dot Christina at arm dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, James Greenhalgh <James dot Greenhalgh at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, "law at redhat dot com" <law at redhat dot com>, "ian at airs dot com" <ian at airs dot com>, "rguenther at suse dot de" <rguenther at suse dot de>
- Date: Tue, 13 Nov 2018 16:41:15 +0000
- Subject: RE: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions
- References: <20181111102611.GA4292@arm.com> <alpine.DEB.2.21.1811130037210.18130@digraph.polyomino.org.uk>
Hi Joseph,
> What types exactly is this meant to apply to? Floating-point? Integer?
> Mixtures of those? (I'm guessing not mixtures, because those would be something other than "convert" here.)
Originally I had it for both Floating-point and Integer, but not a mix of the two.
> For integer types, it's not safe, in that if e.g. F is int and X is unsigned long long, you're changing from defined overflow to undefined overflow.
This is a good point, so I should limited it to floating point formats only.
> For floating-point types, using TYPE_PRECISION is suspect (it's not wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the machine modes in question.
Would restricting it to flag_unsafe_math_optimizations not be enough in this case? Since if it's only done for unsafe math
then you likely won't care about a small loss in precision anyway?
> Specifically, see convert.c:convert_to_real_1, the long comment starting "Sometimes this transformation is safe (cannot change results through affecting double rounding cases) and sometimes it is not.", and the associated code calling real_can_shorten_arithmetic. I think that code in convert.c ought to apply to your case of half precision converted to float for arithmetic and then converted back to half precision afterwards. (In the case where the excess precision configuration - which depends on TARGET_FP_F16INST for AArch64 - says to do arithmetic directly on half precision, anyway.)
Sorry I hadn't attached a test-case because I wanted to get some feedback before.
But my simple testcase is
#include <complex.h>
#define N 200
void foo (_Float16 complex a[N], _Float16 complex b[N], _Float16 complex *c)
{
for (int x = 0; x < N; x++)
c[x] = a[x] + b[x] * I;
}
A simplified version of one of the expressions in the gimple tree for this is
_7 = IMAGPART_EXPR <*_4>;
_13 = REALPART_EXPR <*_3>;
_8 = (floatD.30) _7;
_14 = (floatD.30) _13;
_35 = _14 + _8;
_19 = (_Float16D.33) _35;
IMAGPART_EXPR <*_20> = _19;
note that the type of _4, _3 and _19 are that of half float.
_4 = a_26(D) + _2;
_3 = b_25(D) + _2;
_20 = c_28(D) + _2;
complex _Float16D.43 * a_26(D) = aD.3538;
complex _Float16D.43 * b_25(D) = bD.3539;
complex _Float16D.43 * c_28(D) = cD.3540;
If the code is SLP vectorized, then inside the tree (without my match.pd rule)
it will contain the casts, which is evident from the resulting vector code
ldr h0, [x6, x3]
ldr h2, [x1, x3]
fcvt s0, h0
fcvt s2, h2
fadd s0, s0, s2
fcvt h1, s1
fcvt h0, s0
str h1, [x2, x3]
str h0, [x4, x3]
The problem seems to be (as far as I can tell) that when a loop is present, a new
temporary is created to store the intermediate result, while it's creating the
destination variable with the dest + offset. It can't do the cast and the computation
at the same time.
foo (complex _Float16 * a, complex _Float16 * b, complex _Float16 * c)
{
complex _Float16 D_3548;
complex _Float16 D_3549;
complex float D_3550;
The type of this temporary seems to be the type of the inner computation, so in
this case float due to the 90 rotation of b in the C file.
This also has the effect that it seems to make the type of the imaginary and real
expressions all float
So when convert.c tries to convert them, all it sees are
<imagpart_expr 0x7ffb19016c40
type <real_type 0x7ffb18e362a0 float SF
size <integer_cst 0x7ffb18e1df48 constant 32>
unit-size <integer_cst 0x7ffb18e1df60 constant 4>
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffb18e362a0 precision:32
pointer_to_this <pointer_type 0x7ffb18e36888>>
side-effects
arg:0 <save_expr 0x7ffb19016c00
Because the usage of this expression is to store it inside a complex float, then
convert and constant prop will no longer remove the casts. The float is then
converted into half float again for storing. Because of this all the invocations
of convert_to_real_1 exit at REAL_TYPE and essentially adds a NOP_EXPR doing no
conversions.
So it does seem like match.pd is the best place to do this since it won't care
about the temporaries.
Thanks,
Tamar
> -----Original Message-----
> From: Joseph Myers <joseph@codesourcery.com>
> Sent: Tuesday, November 13, 2018 00:49
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; James Greenhalgh
> <James.Greenhalgh@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; law@redhat.com; ian@airs.com;
> rguenther@suse.de
> Subject: Re: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away
> unneeded type casts in expressions
>
> On Sun, 11 Nov 2018, Tamar Christina wrote:
>
> > This patch adds a match.pd rule for stripping away the type converts
> > when you're converting to a type that has twice the precision of the
> > current type in the same class, doing a simple math operation on it
> > and converting back to the smaller type.
>
> What types exactly is this meant to apply to? Floating-point? Integer?
> Mixtures of those? (I'm guessing not mixtures, because those would be
> something other than "convert" here.)
>
> For integer types, it's not safe, in that if e.g. F is int and X is unsigned long
> long, you're changing from defined overflow to undefined overflow.
>
> For floating-point types, using TYPE_PRECISION is suspect (it's not
> wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the
> machine modes in question.
>
> Specifically, see convert.c:convert_to_real_1, the long comment starting
> "Sometimes this transformation is safe (cannot change results through
> affecting double rounding cases) and sometimes it is not.", and the
> associated code calling real_can_shorten_arithmetic. I think that code in
> convert.c ought to apply to your case of half precision converted to float for
> arithmetic and then converted back to half precision afterwards. (In the case
> where the excess precision configuration - which depends on
> TARGET_FP_F16INST for AArch64 - says to do arithmetic directly on half
> precision, anyway.)
>
> Now, there are still issues in that convert.c code in the decimal floating-point
> case (see bug 40503). And I think match.pd is a much better place for this
> sort of thing than convert.c (and c-family/c-common.c:shorten_binary_op in
> the integer case). But for this specific case of binary floating-point
> conversions, I think the logic in convert.c is what should be followed (but
> moved to match.pd if possible).
>
> This patch is also lacking a testcase, which might show why the existing logic
> in convert.c isn't being applied in whatever case you're concerned with.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com