[PATCH v2 00/11] aarch64: Implement TImode comparisons

Tue Apr 7 23:58:45 GMT 2020

On Tue, Apr 07, 2020 at 02:43:36PM -0700, Richard Henderson wrote:
> On 4/7/20 1:27 PM, Segher Boessenkool wrote:
> > On Mon, Apr 06, 2020 at 12:19:42PM +0100, Richard Sandiford wrote:
> >> The reason I'm not keen on using special modes for this case is that
> >> they'd describe one way in which the result can be used rather than
> >> describing what the instruction actually does.  The instruction really
> >> does set all four flags to useful values.  The "problem" is that they're
> >> not the values associated with a compare between two values, so representing
> >> them that way will always lose information.
> > 
> > CC modes describe how the flags are set, not how the flags are used.
> > You cannot easily describe the V bit setting with a compare (it needs
> > a mode bigger than the register), is that what you mean?
> 
> I think that is a good deal of the effort.

So you need to have different CC modes for different ways of setting
(combinations) of the four NZVC bits, there isn't really any way around
that, that is how CC modes work.

> I wonder if it would be helpful to have
> 
>   (uoverflow_plus x y carry)
>   (soverflow_plus x y carry)
> 
> etc.

Those have three operands, which is nasty to express.

On rs6000 we have the carry bit as a separate register (it really is
only one bit, XER[CA], but in GCC we model it as a separate register).
We handle it as a fixed register (there is only one, and saving and
restoring it is relatively expensive, so this worked out the best).
Still, in the patterns (for insns like "adde") that take both a carry
input and have it as output, the expression for the carry output but
already the one for the GPR output become so unwieldy that nothing
can properly work with it.  So, in the end, I have all such insns that
take a carry input just clobber their carry output.  This works great!

Expressing the carry setting for insns that do not take a carry in is
much easier.  You get somewhat different patterns for various
immediate inputs, but that is all.

[ snip ]

> This does have the advantage of avoiding the extensions, so that constants can
> be retained in the original mode.

But it won't ever survive simplification; or, it will be in the way of
simplification.

A simple test is looking what you get if you do

long long f(long long x) { return x + 0x100000000; }

(on a 32-bit config).

> Though of course if we go this way, there will be incentive to add
> <s,u>overflow codes for all __builtin_*_overflow_p.

Yeah, eww.  And where will it stop?  What muladd insns should we have
special RTL codes for, for the high part?

Segher