This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Expansion of narrowing math built-ins into power instructions


Tejas: given the controversy, I agree unspecs sound like a good approach
for now.  We can always go back and add the rtx codes later once there's
agreement on what they should look like.

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote:
>> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
>> >> It's just a different name, nothing more, nothing less.  Because it is
>> >> a different name it can not be accidentally generated from actual
>> >> truncations.
>> >
>> > I have introduced float_narrow but I could not find appropriate places
>> > to generate it for a call to fadd instead it to generate a CALL. I
>> > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
>> > confused with the rtx codes and passes which generate respective RTL.
>> > It should not be similar to FLOAT_TRUNCATE if we want to avoid it
>> > generating for actual truncations?
>> 
>> Please don't do it this way.  The whole point of the work is that this
>> is a single operation that cannot be modelled as a post-processing of
>> a normal double addition result.  It's a single operation at the source
>> level, a single IFN, a single optab, and a single instruction.  Splitting
>> it apart into two operations for rtl only, and making it look in rtl terms
>> like a post-processing of a normal addition result, seems like it's going
>> to come back to bite us.
>> 
>> In lisp terms we're saying that the operand to the float_narrow is
>> implicitly quoted:
>> 
>>   (float_narrow:m '(plus:n a b))
>> 
>> so that when float_narrow is evaluated, the argument is the unevaluated
>> rtl expression "(plus a b)" rather than the evaluated result a + b.
>> float_narrow then does its own evaluation of a and b and performs a
>> fused addition and narrowing on the result.
>
> RTL isn't Lisp.

Right.  But it's heavily influenced by lisp, so I was using quoting to
explain why I don't think the code is a good fit.

> RTL doesn't have quotations.

I'd like to keep it that way for rvalues :-)

> RTL doesn't have *evaluation*.

But we can (and do) evaluate some rtxes without target help.

> RTL is just a data structure that describes your program instructions.
> A large part of what means what is system-specific.  Rounding of floating
> point is not defined, for example.

Some of the semantics are target-specific, sure, with some of the details
controlled by hooks/macros and some left undefined.  But that's true to a
lesser extent of gimple too.

> And yes, various parts of GCC can manipulate RTL, doing substitution and
> algebraic simplication and whatnot.  All within the rules of RTL.  And
> that means nothing ever can "pass" a float_narrow, because there are no
> rules that allow it to.

You mean create a new float_narrow out of thin air, with no justification?
Sure, but I don't think that was ever the issue.

Or do you mean that target-independent code couldn't just use GET_RTX_FORMAT
to recurse on a float_narrow without first noting that it's a float_narrow
(and thus special)?  If so, then yeah, I agree that they wouldn't be
allowed to do that, which is essentially why I think it's a bad idea.

>> No other rtx rvalue works like this.
>
> A lot of unspecs are used like this, for example.

Unspecs don't have a quoting effect though.  I agree it's common to match
things like:

  (unspec:m [(plus:m ...)] UNSPEC_FOO)

But that doesn't have any quoting effect on the plus.  If the optimisers see:

  (unspec:m [(plus:m x y)] UNSPEC_FOO)

and know what x and y are, they can certainly fold this to:

  (unspec:m [(const_int N)] UNSPEC_FOO)

The result might not match an instruction, but it's still a valid
rtx and a valid thing to try.  A target would be in real trouble
if it allowed both, but with different semantics even for N==x+y.
(In constrast, having different semantics for N==x+y would be valid
if there was a quoting effect.)

Likewise if the optimisers see:

  (set (reg:m z) (plus:m x y))
  ...(unspec:m [(plus:m x y)] UNSPEC_FOO)...

they can create and try to match:

  ...(unspec:m [(reg:m z)] UNSPEC_FOO)...

Again, it might not match an instruction, but it's still a valid rtx and
a valid thing to try.

In other words, everything going into recog has to be valid rtx.
It just might not be a valid instruction.  And the .md files can't
make the target-independent code treat an operation as quoted.
All they can do is refuse to match simplified forms.

This is similar to things like (from mips.md):

(define_insn_and_split "<su>mulsi3_highpart_internal"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (truncate:SI
         (lshiftrt:DI
          (mult:DI (any_extend:DI (match_operand:SI 1 "register_operand" "d"))
                   (any_extend:DI (match_operand:SI 2 "register_operand" "d")))
          (const_int 32))))
   (clobber (match_scratch:SI 3 "=l"))]

IIRC, the port has no highpart operation other than multiplication.
But there's again no quoting effect on the operands to the mult,
lshiftrt or truncate here, so if the optimisers knew that op2==2,
they could transform:

  [(set op0
        (truncate:SI
         (lshiftrt:DI
          (mult:DI (any_extend:DI op1)
                   (any_extend:DI op2))
          (const_int 32))))
   (clobber (scratch:SI))]

to:

  [(set op0
        (truncate:SI
         (lshiftrt:DI
          (plus:DI (any_extend:DI op1)
                   (any_extend:DI op1))
          (const_int 32))))
   (clobber (scratch:SI))]

Again, the instruction won't match, but it's still a valid rtx and
a valid transformation to try.

Going back to the unspec example: if at some point we added a target
hook for evaluating unspecs in the same way that we evaluate basic
arithmetic (might be useful!), the handling of UNSPEC_FOO wouldn't be
able to assert that the plus or whatever is there.  At best it could
punt evaluation when the plus isn't there, at the cost of losing
potentially useful optimisation.  (But to me, having to do that
smacks of a badly-designed unspec.  E.g. we use unspec wrappers around
operations a lot in the SVE port, but it would still be possible to
evaluate the unspec given fully-evaluated operands.)

float_narrow is different in that the plus (or whatever operation
it's quoting) has to be kept in-place rather than folded away,
otherwise the rtx itself is malformed and could trigger an ICE,
just like the zero_extend of a const_int that I mentioned.

>> Using float_narrow would also be inconsistent with the way we handle
>> saturating arithmetic.  There we use US_PLUS and SS_PLUS rtx codes for
>> unsigned and signed saturating plus respectively, rather than:
>> 
>>   (unsigned_sat '(plus a b))
>>   (signed_sat '(plus a b))
>> 
>> Using dedicated codes might seem clunky.  But it's simple, safe, and fits
>> the existing model without special cases. :-)
>
> And you need many many more RTX codes, which you will not handle in
> almost all places, because there are too many.
>
>
> I agree this construct is not as nice as could be hoped for.  I don't
> agree that 60 new RTX codes is an acceptable solution (or that that will
> ever really work out, even).

60 sounds a high number. :-)  Do we really have that many rtx codes with
a floating-point rounding effect?

Whatever the number is, we'll still be listing them individually for
built-in enumerations, internal_fn, and (I assume) optabs.  But maybe
after a certain point it does become too unwieldly for rtx codes.
We have to keep it within 16 bits at least...

> It would be nice if somehow we could make a variant of RTL codes, so that
> we could have nice and simple code that applies to all variants of some
> code.  Not sure how that would work out.  Maybe we don't have to do this
> very generically, how often will we need this anyway?
>
> I have three examples so far:
> 1) Saturating arithmetic;
> 2) This float_narrow thing;
> 3) Ordered compares, that is, fp compares that set an exception on NaNs.
>
> Something that works for all three would be nice!

Yeah, agree that sounds good.  Maybe we could bundle the code with some
flags.  Storage-wise, there should be room for that in the u2 field.

But there might still be cases in which it's useful to view the code+flags
as a combined supercode, e.g. for switch statements.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]