This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [4.0 PATCH, i386]: Fix PR target/20421


Richard Henderson wrote:

On Wed, Mar 16, 2005 at 08:54:36AM +0100, Uros Bizjak wrote:


+;; Following rounding expanders clobber FLAGS_REG, when logic instructions
+;; are emitted in emit_i387_cw_initialization () function. The "if"
+;; condition in this function and the "if" condition in the body of round
+;; expanders should be kept in sync to select correct FLAGS_REG clobbering
+;; pattern.
+
+(define_insn_and_split "frndintxf2_floor_1"



Any reason to not just add the clobber unconditionally, and let it
vanish in those cases where we don't wind up killing the flags?
Given that we do the split before sched2, it doesn't seem like we're
buying anything at all with this extra code.


To loose a clobber after !(reload_completed || reload_in_progress), the only solution I have found is by using two patterns, for example in the floor() case:

(define_insn_and_split "frndintxf2_floor_1"
 [(set (match_operand:XF 0 "register_operand" "=f")
   (unspec:XF [(match_operand:XF 1 "register_operand" "0")]
    UNSPEC_FRNDINT_FLOOR))
  (use (match_operand:HI 2 "memory_operand" "m"))
  (use (match_operand:HI 3 "memory_operand" "m"))
  (clobber (reg:CC FLAGS_REG))]
 "TARGET_USE_FANCY_MATH_387
  && flag_unsafe_math_optimizations
  && !(reload_completed || reload_in_progress)"
 "#"
 "&& 1"
 [(parallel [(set (match_dup 0)
         (unspec:XF [(match_dup 1)]
          UNSPEC_FRNDINT_FLOOR))
         (use (match_dup 2))
         (use (match_dup 3))])]
 ""
 [(set_attr "type" "frndint")
  (set_attr "i387_cw" "floor")
  (set_attr "mode" "XF")])

(define_insn "frndintxf2_floor"
 [(set (match_operand:XF 0 "register_operand" "=f")
   (unspec:XF [(match_operand:XF 1 "register_operand" "0")]
    UNSPEC_FRNDINT_FLOOR))
  (use (match_operand:HI 2 "memory_operand" "m"))
  (use (match_operand:HI 3 "memory_operand" "m"))]
 "TARGET_USE_FANCY_MATH_387
  && flag_unsafe_math_optimizations"
 "fldcw\t%3\n\tfrndint\n\tfldcw\t%2"
 [(set_attr "type" "frndint")
  (set_attr "i387_cw" "floor")
  (set_attr "mode" "XF")])


However, we _know_ that FLAGS_REG clobbering pattern is needed only in (!TARGET_PARTIAL_REG_STALL && !optimize_size && !TARGET_64BIT) case. Since we already have two patterns (needed to remove the clobber from frndint insn pattern after LCM inserts initialization code), we could generate non-clobbering pattern directly in "floor?f2" expander. It will cost only one extra "if" and in this case, but perhaps some extra optimization can be performed when non-clobbering pattern is inserted.


This solution works as expected. For -march=pentiumpro (TARGET_PARTIAL_REG_STALL target), when compiling a simple testcase (gcc -O2 -march=pentiumpro -ffast-math):

double a(double x) {
       return floor(x);
}

In _.c.19.regmove, a flags_reg clobbering pattern {frndintxf2_floor_clob} is still present, and this pattern blocks some unwanted optimization effects (see PR target/12308). When LCM inserts mode initialization sequence, {frndintxf2_floor_clob}pattern looses its clobber (as clobber is now marked in logic insns). Now, this sequence is generated (_.x.22.lreg):

...
{x86_fnstcw_1}
{*movhi_1}
{*andhi_1}    (clobber)
{*iorhi_1}     (clobber)
{*movhi_1}
{frndintxf2_floor} (no clobber)
...

OTOH, in !TARGET_PARTIAL_REG_STALL, {frndintxf2_floor} pattern is generated directly in expander, so prevously wrong (PR target/12308, comment #2 - correct code, where initialization uses movb, and #3 -incorrect code, where initialization uses orb) optimization could be performed without any unwanted effects.

Same comment applies to your mainline patch.



In mainline patch, the same approach is applied to fix_trunc patterns. We could generate non-clobbering patterns in expander if we know that control word initialization sequence doesn't use any flags clobbering instructions. In this case, one new pattern is needed ("fix_trunc<mode>_i387_1" in addition to "fix_trunc<mode>_i387_clob"), and extra "if" in all expanders.

I think that all this machinery is needed to produce optimal code for all targets. We in fact need two patterns to split clobbering pattern into non-clobbering pattern for TARGET_PARTIAL_REG_STALL. A bypass in expanders just generates non-clobbering pattern directly, so other targets are not penalized by clobber which is actually not needed in their case.


Uros.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]