[PATCH] [ARC] Add single/double IEEE precission FPU support.

Wed Feb 3 18:42:00 GMT 2016

On 03/02/16 15:02, Claudiu Zissulescu wrote:
> First, I will split this patch in two. The first part will deal with the FPU instructions. The second patch, will try to address a new abi optimized for odd-even registers as the comments for the mabi=optimized are numerous and I need to carefully prepare for an answer.
> The remaining of this email will focus on FPU patch.
>
>> +      case EQ:
>> +      case NE:
>> +      case UNORDERED:
>> +      case UNLT:
>> +      case UNLE:
>> +      case UNGT:
>> +      case UNGE:
>> +       return CC_FPUmode;
>> +
>> +      case LT:
>> +      case LE:
>> +      case GT:
>> +      case GE:
>> +      case ORDERED:
>> +       return CC_FPUEmode;
>>
>> cse and other code transformations are likely to do better if you use
>> just one mode for these.  It is also very odd to have comparisons and their
>> inverse use different modes.  Have you done any benchmarking for this?
> Right, the ORDERED should be in CC_FPUmode. An inspiration point for CC_FPU/CC_FPUE mode is the arm port.

I can't see how this code in the arm can actually work correctly. When, 
for instance, combine simplifies
a comparison, the comparison code can change, and it will use 
SELECT_CC_MODE to find a new
mode for the comparison.  Thus, if a comparison traps or not on qNaNs 
will depend on the whims
of combine.
Also, the the trapping comparisons don't show the side effect of 
trapping on qNaNs, which means
they can be speculated.

To make the trapping comparisons safe, they should display the side 
effect in the rtl, and only
be used when requested by options, type attributes, pragmas etc.
They could almost be safe to use be default for -ffinite-math-only, 
except that when the frontend knows
how to tell qNaNs and sNaNs apart, and speculates a comparison after 
some integer/fp mixed computation when it can infer that no sNaN will 
occur, you could still get an unexpected signal.
>   The reason why having the two CC_FPU and CC_FPUE modes is to emit signaling FPU compare instructions.
I don't know if your compare instructions are signalling for quiet NaNs 
(I hope they're not),
but  the mode of the comparison result shouldn't be used to distinguish 
that - it's not safe,
see above.
The purpose of the mode of the result is to distinguish different 
interpretations for the bit
patterns inside the comparison result flags.
>    We can use a single CC_FPU mode here instead of two, but we may lose functionality.
Can you define what that functionality actually is, and show some simple 
test code
to demonstrate how it works with your port  extension?
> Regarding benchmarks, I do not have an establish benchmark for this, however, as far as I could see the code generated for FPU looks clean.
> Please let me know if it is acceptable to go with CC_FPU/CC_FPUE, and ORDERED fix further on.
No, there should be a discernible and achievable purpose for comparison 
modes.
Which you have not demonstrated yet so far for the CC_FPU/CC_FPUE 
dichotomy.
>   Or, to have a single mode.
Yes.
>
>> +  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
>> +     registers.  */
>> +  if (TARGET_HS)
>> +    {
>> +      for (regno = 1; regno < 32; regno +=2)
>> +       {
>> +         arc_hard_regno_mode_ok[regno] = S_MODES;
>> +       }
>> +    }
>> +
>>
>> Does TARGET_HS with -mabi=default allow for passing DFmode / DImode
>> arguments
>> in odd registers?  I fear you might run into reload trouble when trying to
>> access the values.
> Although, I haven't bump into this issue until now, I do not say it may not happen. Hence, I would create a new register class to hold the odd-even registers. Hence the above code will not be needed. What do u say?
That would have been possible a couple of years ago, but these days, all 
the constituent
registers of a multi-reg hard register have to be in a constraint's 
register class for the
constraint to match.
You could fudge this by using two classes for two-reg allocations, one 
with 0,1, 4,5, 8,9 ... the other
with 2,3, 6,7, 10,11 ... , but then you need another pair for 
four-register allocations, and maybe you
want to add various union and intersection classes, and the register 
allocators are rather rubbis
when it comes to balance allocations between classes of similar size and 
utility, so you should
really try to avoid this.
A way to avoid this is not to give the option of using the old ABI while 
enforcing alignment in registers.
Or you could use a different mode for the argument passing when it ends 
up unaligned; I suppose
BLKmode should work, using a vector to designate the constituent 
registers of the function argument.
> +(define_insn "*cmpsf_trap_fpu"
>
> That name makes as little sense to me as having two separate modes
> CC_FPU and CC_FPUE
> for positive / negated usage and having two comparison patterns pre
> precision that
> do the same but pretend to be dissimilar.
>
> The F{S/D}CMPF instruction is similar to the F{S/D}CMP instruction
Oops. I missed the 'f' suffix.  So the "*trap_fpu" patterns really are 
different...
>   in cases when either of the instruction operands is a signaling NaN. The FxCMPF instruction updates the invalid flag (FPU_STATUS.IV) when either of the operands is a quiet or signaling NaN, whereas, the FxCMP instruction updates the invalid flag (FPU_STATUS.IV) only when either of the operands is a quiet NaN. We need to use the FxCMPF only if we keep the CC_FPU an CC_FPUE otherwise, we shall use only FxCMP instruction.
... and have different side effects not mentioned in the rtl.
When the invalid flag is set, does that mean the CPU gets an FPU exception?
So quiet NaNs always signal, and signalling NaNs only when used as 
operands of FxCMPF?

You might need to resort to software floating point comparisons then unless
-ffinite-math-only is in effect.
> Is this better?
>
> #define TARGET_FP_SP_BASE   ((arc_fpu_build & FPU_SP) != 0)
> #define TARGET_FP_DP_BASE   ((arc_fpu_build & FPU_DP) != 0)
> #define TARGET_FP_SP_FUSED  ((arc_fpu_build & FPU_SF) != 0)
> #define TARGET_FP_DP_FUSED  ((arc_fpu_build & FPU_DF) != 0)
> #define TARGET_FP_SP_CONV   ((arc_fpu_build & FPU_SC) != 0)
> #define TARGET_FP_DP_CONV   ((arc_fpu_build & FPU_DC) != 0)
> #define TARGET_FP_SP_SQRT   ((arc_fpu_build & FPU_SD) != 0)
> #define TARGET_FP_DP_SQRT   ((arc_fpu_build & FPU_DD) != 0)
> #define TARGET_FP_DP_AX     ((arc_fpu_build & FPX_DP) != 0)
Yes.