This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.




On 03/02/16 15:02, Claudiu Zissulescu wrote:
First, I will split this patch in two. The first part will deal with the FPU instructions. The second patch, will try to address a new abi optimized for odd-even registers as the comments for the mabi=optimized are numerous and I need to carefully prepare for an answer.
The remaining of this email will focus on FPU patch.

+      case EQ:
+      case NE:
+      case UNORDERED:
+      case UNLT:
+      case UNLE:
+      case UNGT:
+      case UNGE:
+       return CC_FPUmode;
+
+      case LT:
+      case LE:
+      case GT:
+      case GE:
+      case ORDERED:
+       return CC_FPUEmode;

cse and other code transformations are likely to do better if you use
just one mode for these.  It is also very odd to have comparisons and their
inverse use different modes.  Have you done any benchmarking for this?
Right, the ORDERED should be in CC_FPUmode. An inspiration point for CC_FPU/CC_FPUE mode is the arm port.

I can't see how this code in the arm can actually work correctly. When, for instance, combine simplifies a comparison, the comparison code can change, and it will use SELECT_CC_MODE to find a new mode for the comparison. Thus, if a comparison traps or not on qNaNs will depend on the whims
of combine.
Also, the the trapping comparisons don't show the side effect of trapping on qNaNs, which means
they can be speculated.

To make the trapping comparisons safe, they should display the side effect in the rtl, and only
be used when requested by options, type attributes, pragmas etc.
They could almost be safe to use be default for -ffinite-math-only, except that when the frontend knows how to tell qNaNs and sNaNs apart, and speculates a comparison after some integer/fp mixed computation when it can infer that no sNaN will occur, you could still get an unexpected signal.
  The reason why having the two CC_FPU and CC_FPUE modes is to emit signaling FPU compare instructions.
I don't know if your compare instructions are signalling for quiet NaNs (I hope they're not), but the mode of the comparison result shouldn't be used to distinguish that - it's not safe,
see above.
The purpose of the mode of the result is to distinguish different interpretations for the bit
patterns inside the comparison result flags.
   We can use a single CC_FPU mode here instead of two, but we may lose functionality.
Can you define what that functionality actually is, and show some simple test code
to demonstrate how it works with your port  extension?
Regarding benchmarks, I do not have an establish benchmark for this, however, as far as I could see the code generated for FPU looks clean.
Please let me know if it is acceptable to go with CC_FPU/CC_FPUE, and ORDERED fix further on.
No, there should be a discernible and achievable purpose for comparison modes. Which you have not demonstrated yet so far for the CC_FPU/CC_FPUE dichotomy.
  Or, to have a single mode.
Yes.

+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
+     registers.  */
+  if (TARGET_HS)
+    {
+      for (regno = 1; regno < 32; regno +=2)
+       {
+         arc_hard_regno_mode_ok[regno] = S_MODES;
+       }
+    }
+

Does TARGET_HS with -mabi=default allow for passing DFmode / DImode
arguments
in odd registers?  I fear you might run into reload trouble when trying to
access the values.
Although, I haven't bump into this issue until now, I do not say it may not happen. Hence, I would create a new register class to hold the odd-even registers. Hence the above code will not be needed. What do u say?
That would have been possible a couple of years ago, but these days, all the constituent registers of a multi-reg hard register have to be in a constraint's register class for the
constraint to match.
You could fudge this by using two classes for two-reg allocations, one with 0,1, 4,5, 8,9 ... the other with 2,3, 6,7, 10,11 ... , but then you need another pair for four-register allocations, and maybe you want to add various union and intersection classes, and the register allocators are rather rubbis when it comes to balance allocations between classes of similar size and utility, so you should
really try to avoid this.
A way to avoid this is not to give the option of using the old ABI while enforcing alignment in registers. Or you could use a different mode for the argument passing when it ends up unaligned; I suppose BLKmode should work, using a vector to designate the constituent registers of the function argument.
+(define_insn "*cmpsf_trap_fpu"

That name makes as little sense to me as having two separate modes
CC_FPU and CC_FPUE
for positive / negated usage and having two comparison patterns pre
precision that
do the same but pretend to be dissimilar.

The F{S/D}CMPF instruction is similar to the F{S/D}CMP instruction
Oops. I missed the 'f' suffix. So the "*trap_fpu" patterns really are different...
  in cases when either of the instruction operands is a signaling NaN. The FxCMPF instruction updates the invalid flag (FPU_STATUS.IV) when either of the operands is a quiet or signaling NaN, whereas, the FxCMP instruction updates the invalid flag (FPU_STATUS.IV) only when either of the operands is a quiet NaN. We need to use the FxCMPF only if we keep the CC_FPU an CC_FPUE otherwise, we shall use only FxCMP instruction.
... and have different side effects not mentioned in the rtl.
When the invalid flag is set, does that mean the CPU gets an FPU exception?
So quiet NaNs always signal, and signalling NaNs only when used as operands of FxCMPF?

You might need to resort to software floating point comparisons then unless
-ffinite-math-only is in effect.
Is this better?

#define TARGET_FP_SP_BASE   ((arc_fpu_build & FPU_SP) != 0)
#define TARGET_FP_DP_BASE   ((arc_fpu_build & FPU_DP) != 0)
#define TARGET_FP_SP_FUSED  ((arc_fpu_build & FPU_SF) != 0)
#define TARGET_FP_DP_FUSED  ((arc_fpu_build & FPU_DF) != 0)
#define TARGET_FP_SP_CONV   ((arc_fpu_build & FPU_SC) != 0)
#define TARGET_FP_DP_CONV   ((arc_fpu_build & FPU_DC) != 0)
#define TARGET_FP_SP_SQRT   ((arc_fpu_build & FPU_SD) != 0)
#define TARGET_FP_DP_SQRT   ((arc_fpu_build & FPU_DD) != 0)
#define TARGET_FP_DP_AX     ((arc_fpu_build & FPX_DP) != 0)
Yes.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]