This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, take 2] Fix PR target/28946


Hi H.J.,

On Thu, 7 Sep 2006, H. J. Lu wrote:
> 2006-09-05  H.J. Lu  <hongjiu.lu@intel.com>
>
> 	* config/i386/i386.c (x86_partial_flag_reg_stall): New.
>
> 	* config/i386/i386.h (x86_partial_flag_reg_stall): New.
> 	(TARGET_PARTIAL_FLAG_REG_STALL): New.
>
> 	* config/i386/i386.md (*ashldi3_cmp_rex64): Disabled for
> 	TARGET_PARTIAL_FLAG_REG_STALL.
> 	(*ashldi3_cconly_rex64): Likewise.
> 	(*ashlsi3_cmp): Likewise.
> 	(*ashlsi3_cconly): Likewise.
> 	(*ashlsi3_cmp_zext): Likewise.
> 	(*ashlhi3_cmp): Likewise.
> 	(*ashlhi3_cconly): Likewise.
> 	(*ashlqi3_cmp): Likewise.
> 	(*ashlqi3_cconly): Likewise.
> 	(*ashrdi3_cmp_rex64): Likewise.
> 	(*ashrdi3_cconly_rex64): Likewise.
> 	(*ashrsi3_cmp): Likewise.
> 	(*ashrsi3_cconly): Likewise.
> 	(*ashrsi3_cmp_zext): Likewise.
> 	(*ashrhi3_cmp): Likewise.
> 	(*ashrhi3_cconly): Likewise.
> 	(*ashrqi3_cmp): Likewise.
> 	(*ashrqi3_cconly): Likewise.
> 	(*lshrdi3_cmp_rex64): Likewise.
> 	(*lshrdi3_cconly_rex64): Likewise.
> 	(*lshrsi3_cmp): Likewise.
> 	(*lshrsi3_cconly): Likewise.
> 	(*lshrsi3_cmp_zext): Likewise.
> 	(*lshrhi3_cmp): Likewise.
> 	(*lshrhi3_cconly): Likewise.
> 	(*lshrqi2_cmp): Likewise.
> 	(*lshrqi2_cconly): Likewise.

This is OK for mainline, on top of Uros' regression fix.  I'm not sure
if the m_GENERIC functionality has been backported to the 4.1 or 4.0
branches, so we can haggle over whether it's worthwhile backporting
this revision, especially given core and core2 tuning hasn't officially
be contributed to GCC (even to mainline) yet.


> Core and Core 2 have partial flag register stalls. Penalty of partial
> flag register stall is from 60-100%. Cost of extra test on Pentium 4
> is 0-10%.

I don't doubt you and the Intel engineers know what you're doing but I
do find this just a little curious.  Presumably, statistical analysis
has shown that shifts by zero are frequent enough to merit a bypass
and therefore trigger the significant partial flag register stall overhead
for shifts by non-zero values.  I'd have thought that treating shifts as
always setting the flags, would lead to better code density and fewer
stalls at the expense of presumably rare shifts by zero.

Equally curious from the way your patch is coded is that this "partial
flag register stall" isn't currently avoided in hardware even for shift
instruction is encoded with an immediate non-zero operand.

For example, the pattern

+(define_insn "*ashlsi3_cconly"
+  [(set (reg FLAGS_REG)
+	(compare
+	  (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0")
+		     (match_operand:QI 2 "const_1_to_31_operand" "I"))
+	  (const_int 0)))
+   (clobber (match_scratch:SI 0 "=r"))]

can only have a non-zero shift count, and this should be detectable
by decode logic much earlier in the pipeline.  In which case, I'd
assume that many of your new tests could look like

	optimize_size
	|| GET_CODE (operands[2]) == CONST_INT
	|| !TARGET_PARTIAL_FLAG_REG_STALL
	|| ...


Perhaps something for the next generation of microarchitectures.

Having said all of that it's GCC's job to generate code for CPUs that
folks decide to make, however bizarre the design decisions may be, so
this change is OK for mainline.  Though I'd be interested in hearing
from Honza and the AMD folks whether they're happy that this affects
the "generic" tuning.  At worst, it's no worse that before Uros'
change.  Perhaps once the x86 backend gets tuning options for core
and core2, we can refine TARGET_PARTIAL_FLAG_REG_STALL.

Thanks again for double checking and resolving the potential performance
impact of Uros' regression fix, and the additional testing.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]