This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.


On Wed, 18 May 2016, Matthew Wahab wrote:

> AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like
> flush-to-zero that could affect the outcome of a calculation.

The result of a float computation on two values immediately promoted from 
fp16 cannot be within the subnormal range for float.  Thus, only one flush 
to zero can happen, on the final conversion back to fp16, and that cannot 
make the result different from doing direct arithmetic in fp16 (assuming 
flush to zero affects conversion from float to fp16 the same way it 
affects direct fp16 arithmetic).

> > So I'd expect e.g.
> > 
> > __fp16 a, b;
> > __fp16 c = a / b;
> > 
> > to generate the new instructions, because direct binary16 arithmetic is a
> > correct implementation of (__fp16) ((float) a / (float) b).
> 
> Something like
> 
> __fp16 a, b, c;
> __fp16 d = (a / b) * c;
> 
> would be done as the sequence of single precision operations:
> 
> vcvtb.f32.f16 s0, s0
> vcvtb.f32.f16 s1, s1
> vcvtb.f32.f16 s2, s2
> vdiv.f32 s15, s0, s1
> vmul.f32 s0, s15, s2
> vcvtb.f16.f32 s0, s0
> 
> Doing this with vdiv.f16 and vmul.f16 could change the calculated result
> because the flush-to-zero rule is related to operation precision so affects
> the value of a vdiv.f16 differently from the vdiv.f32.

Flush to zero is irrelevant here, since that sequence of three operations 
also cannot produce anything in the subnormal range for float.  (It's true 
that double rounding is relevant for your example and so converting it to 
direct fp16 arithmetic would not be safe for that reason.)

That example is also not relevant to my point.  In my example

> > __fp16 a, b;
> > __fp16 c = a / b;

it's already the case that GCC will (a) promote to float, because the 
target hooks say to do so, (b) notice that the result is immediately 
converted back to fp16, and that this means fp16 arithmetic could be used 
directly, and so adjust it back to fp16 arithmetic (see convert_to_real_1, 
and the call therein to real_can_shorten_arithmetic which knows conditions 
under which it's safe to change such promoted arithmetic back to 
arithmetic on a narrower type).  Then the expanders (I think) notice the 
lack of direct HFmode arithmetic and so put the widening / narrowing back 
again.

But in your example, *because* doing it with direct fp16 arithmetic would 
not be equivalent, convert_to_real_1 would not eliminate the conversions 
to float, the float operations would still be present at expansion time, 
and so direct HFmode arithmetic patterns would not match.

In short: instructions for direct HFmode arithmetic should be described 
with patterns with the standard names.  It's the job of the 
architecture-independent compiler to ensure that fp16 arithmetic in the 
user's source code only generates direct fp16 arithmetic in GIMPLE (and 
thus ends up using those patterns) if that is a correct representation of 
the source code's semantics according to ACLE.

The intrinsics you provide can then be written to use direct arithmetic, 
and rely on convert_to_real_1 eliminating the promotions, rather than 
needing built-in functions at all, just like many arm_neon.h intrinsics 
make direct use of GNU C vector arithmetic.

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]