This page discusses a number of related problems and desired features of ia64 back end's floating point handling, and a plan for solving all of them.

Problems

At root, the problem is that the ia64 back end does not accurately model the hardware. The ia64 architecture's floating point has a number of exotic features:

GCC's model is inaccurate in the following ways:

There are also related maintenance headaches:

Features

It is desired to add some features of the HP-UX system compiler for ia64 to GCC. They facilitate coding high-speed math libraries in C rather than in assembly.

=__fpreg=

=fpreg= is an extended floating point type which provides user access to the full width of the floating point registers. It has the following properties:

=#pragma _USE_SF=

This #pragma gives control over the choice of floating-point control register. It has the syntax

=# pragma USE_SF _n

wherenis 0, 1, 2, or 3. Its effect is to cause all subsequent floating-point operations to use the specified control register, until the end of the containing block. It is constrained to appear only once per lexical block and only at the beginning thereof. HP's specification says that the effect of the #pragma only applies to assembly intrinsics, but acc applies it to all operations. The GCC implementation will be consistent with acc.

Note that inline divide and square root will always use control register 1 for intermediate calculations.

Assembly intrinsics

acc supports a large set of intrinsics (machine-specific builtins, in GCC terminology) which map directly to floating-point instructions that may not be readily accessible from C. It would be nice to support these. Post implementation of all the above features and improvements, this will be easy, as GCC already has plenty of support for machine-specific builtins.

Plan

The hard part is modeling the machine behavior without a combinatorial explosion in the size of ia64.md It is also desirable, although less important, to avoid a combinatorial explosion in the size of the generated files.

We believe free extension of narrow to wide floating-point modes can be best modeled by creating new floating-point operand predicates which accept either (reg''M1'' or (float_extend''M1''(reg''M2'') (where M2is narrower thanM1. These would be used for input operands of arithmetic instructions. Combine should then be able to merge explicit extension instructions with arithmetic. However, Richard Henderson cautions that this may require changes to reload (specifically, to recognize that the thing that needs reloading is (reg''M2'' .

It is already possible to model free truncation after arithmetic and alternate control registers; the only issue is the combinatorial explosion in the size of ia64.md and concomitant maintenance problems. Let's look at an example set of arithmetic patterns.

(define_insn "adddf3"
  [(set (match_operand:DF 0 "fr_register_operand" "=f")
        (plus:DF (match_operand:DF 1 "fr_register_operand" "%f")
                 (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG")))]
  ""
  "fadd.d %0 = %1, %F2"
  [(set_attr "itanium_class" "fmac")])

(define_insn "*adddf3_trunc"
  [(set (match_operand:SF 0 "fr_register_operand" "=f")
        (float_truncate:SF
          (plus:DF (match_operand:DF 1 "fr_register_operand" "%f")
                   (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))))]
  ""
  "fadd.s %0 = %1, %F2"
  [(set_attr "itanium_class" "fmac")])

(define_insn "*adddf3_alts"
  [(set (match_operand:DF 0 "fr_register_operand" "=f")
        (plus:DF (match_operand:DF 1 "fr_register_operand" "%f")
                 (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG")))
    (use (match_operand:SI 3 "const_int_operand" ""))]
  ""
  "fadd.d.s%3 %0 = %1, %F2"
  [set_attr "itanium_class" "fmac")])

(define_insn "*adddf3_trunc_alts"
  [(set (match_operand:SF 0 "fr_register_operand" "=f")
        (float_truncate:SF
          (plus:DF (match_operand:DF 1 "fr_register_operand" "%f")
                   (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG"))))
    (use (match_operand:SI 3 "const_int_operand" ""))]
  ""
  "fadd.s.s%3 %0 = %1, %F2"
  [(set_attr "itanium_class" "fmac")])

As you can see, this is highly repetitive. (The *adddf3_alts and *addf3_trunc_alts patterns do not actually exist in the machine description, because alts= patterns have only been added when they are used directly by other parts of the machine description, in an effort to keep the repetition down.) What we would like is a notation that allowed us to write just the first pattern, or something very like it, and have the other three patterns synthesized. Richard Sandiford's "mode macros" and "code macros" are the most obvious related feature, but they don't facilitate mutating the RTL template. A better analogy is define_cond_exec which _doescreate modified patterns with mutated RTL templates.

Here is a half-baked suggestion for a construct that might work:

(define_pattern_macro "fp_insn"
  [(set (match_operand 1) (match_operand 2)]
  ["*_trunc<narrower_float>"
   (parallel[ (set (match_dup:narrower_float 1)
                   (float_truncate:narrower_float (match_dup 2))) ])
   "*_alts"
   (parallel[ (match_dup 0)
              (use (match_operand:SI 3 "const_int_operand" "")) ])
   "*_trunc<narrower_float>_alts"
   (parallel[ (set (match_dup:narrower_float 1)
                   (float_truncate:narrower_float (match_dup 2)))
              (use (match_operand:SI 3 "const_int_operand" "")) ])
   ])

(define_fp_insn "adddf3"
  [(set (match_operand:DF 0 "fr_register_operand" "=f")
        (plus:DF (match_operand:DF 1 "fr_register_operand" "%f")
                 (match_operand:DF 2 "fr_reg_or_fp01_operand" "fG")))]
  ""
  "fadd%m0%s3 %0 = %1, %F2"
  [(set_attr "itanium_class" "fmac")])

Here narrower_float is a mode macro, defined to expand to only those floating point modes that are narrower than the mode of the arithmetic. (This is not a capability that mode macros have at present, but it should be possible to add.) (match_dup 0) automatically refers to the entire pattern, and (match_dup''M''_n_) means "operandnbut with modeM.

The output template has to be aware of the selected operation mode, hence %m0 and of whether or not the third operand evenexists which I paper over with %s3 (May or may not correspond to actual ia64 output_operand modifier letters.)

One could then go even farther, and use mode macros in the define_fp_insn so that it would not need repeating for every floating-point mode.


Having done all that, it is then comparatively simple to implement __fpreg by adding a new mode, tentatively named RFmode ("register float"), to the back end. GCC will not be aware of the precise format of this mode, as the architecture manual indicates that it may change in the future.

Passive conversions will be achieved by making the truncrf?f patterns be no-ops, and by modifying the floating-point operand predicates to accept (float_truncate:MODE (reg:RF)) in an rvalue position where (reg:MODE) would have been acceptable.

Restricting the set of operations allowed for __fpreg may require front end changes to produce sensible error messages rather than an ICE or a bizarre link failure (eg. undefined symbol __divrf3 ). However, it may be feasible to do that entirely in the back end by writing stub expanders that call error

Implementation of #pragma _USE_SF will definitely require front-end changes, as GCC currently has no support for lexically-scoped #pragma (This feature is desirable for C99 and OpenMP support as well.) Assuming some mechanism for propagating the information all the way through the tree optimizers, the backend needs only check the state of the #pragma in expanders and choose the appropriate pattern.

None: ia64_floating_point (last edited 2008-01-10 19:38:46 by localhost)