This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [4.0 PATCH, i386]: Fix PR target/20421
Richard Henderson wrote:
On Wed, Mar 16, 2005 at 08:54:36AM +0100, Uros Bizjak wrote:
+;; Following rounding expanders clobber FLAGS_REG, when logic instructions
+;; are emitted in emit_i387_cw_initialization () function. The "if"
+;; condition in this function and the "if" condition in the body of round
+;; expanders should be kept in sync to select correct FLAGS_REG clobbering
+;; pattern.
+
+(define_insn_and_split "frndintxf2_floor_1"
Any reason to not just add the clobber unconditionally, and let it
vanish in those cases where we don't wind up killing the flags?
Given that we do the split before sched2, it doesn't seem like we're
buying anything at all with this extra code.
To loose a clobber after !(reload_completed || reload_in_progress), the
only solution I have found is by using two patterns, for example in the
floor() case:
(define_insn_and_split "frndintxf2_floor_1"
[(set (match_operand:XF 0 "register_operand" "=f")
(unspec:XF [(match_operand:XF 1 "register_operand" "0")]
UNSPEC_FRNDINT_FLOOR))
(use (match_operand:HI 2 "memory_operand" "m"))
(use (match_operand:HI 3 "memory_operand" "m"))
(clobber (reg:CC FLAGS_REG))]
"TARGET_USE_FANCY_MATH_387
&& flag_unsafe_math_optimizations
&& !(reload_completed || reload_in_progress)"
"#"
"&& 1"
[(parallel [(set (match_dup 0)
(unspec:XF [(match_dup 1)]
UNSPEC_FRNDINT_FLOOR))
(use (match_dup 2))
(use (match_dup 3))])]
""
[(set_attr "type" "frndint")
(set_attr "i387_cw" "floor")
(set_attr "mode" "XF")])
(define_insn "frndintxf2_floor"
[(set (match_operand:XF 0 "register_operand" "=f")
(unspec:XF [(match_operand:XF 1 "register_operand" "0")]
UNSPEC_FRNDINT_FLOOR))
(use (match_operand:HI 2 "memory_operand" "m"))
(use (match_operand:HI 3 "memory_operand" "m"))]
"TARGET_USE_FANCY_MATH_387
&& flag_unsafe_math_optimizations"
"fldcw\t%3\n\tfrndint\n\tfldcw\t%2"
[(set_attr "type" "frndint")
(set_attr "i387_cw" "floor")
(set_attr "mode" "XF")])
However, we _know_ that FLAGS_REG clobbering pattern is needed only in
(!TARGET_PARTIAL_REG_STALL && !optimize_size && !TARGET_64BIT) case.
Since we already have two patterns (needed to remove the clobber from
frndint insn pattern after LCM inserts initialization code), we could
generate non-clobbering pattern directly in "floor?f2" expander. It will
cost only one extra "if" and in this case, but perhaps some extra
optimization can be performed when non-clobbering pattern is inserted.
This solution works as expected. For -march=pentiumpro
(TARGET_PARTIAL_REG_STALL target), when compiling a simple testcase (gcc
-O2 -march=pentiumpro -ffast-math):
double a(double x) {
return floor(x);
}
In _.c.19.regmove, a flags_reg clobbering pattern
{frndintxf2_floor_clob} is still present, and this pattern blocks some
unwanted optimization effects (see PR target/12308). When LCM inserts
mode initialization sequence, {frndintxf2_floor_clob}pattern looses its
clobber (as clobber is now marked in logic insns). Now, this sequence is
generated (_.x.22.lreg):
...
{x86_fnstcw_1}
{*movhi_1}
{*andhi_1} (clobber)
{*iorhi_1} (clobber)
{*movhi_1}
{frndintxf2_floor} (no clobber)
...
OTOH, in !TARGET_PARTIAL_REG_STALL, {frndintxf2_floor} pattern is
generated directly in expander, so prevously wrong (PR target/12308,
comment #2 - correct code, where initialization uses movb, and #3
-incorrect code, where initialization uses orb) optimization could be
performed without any unwanted effects.
Same comment applies to your mainline patch.
In mainline patch, the same approach is applied to fix_trunc patterns.
We could generate non-clobbering patterns in expander if we know that
control word initialization sequence doesn't use any flags clobbering
instructions. In this case, one new pattern is needed
("fix_trunc<mode>_i387_1" in addition to "fix_trunc<mode>_i387_clob"),
and extra "if" in all expanders.
I think that all this machinery is needed to produce optimal code for
all targets. We in fact need two patterns to split clobbering pattern
into non-clobbering pattern for TARGET_PARTIAL_REG_STALL. A bypass in
expanders just generates non-clobbering pattern directly, so other
targets are not penalized by clobber which is actually not needed in
their case.
Uros.