This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] partial register update for a bit mask operation on x86


On Tue, May 08, 2007 at 12:49:45PM -0700, Hui-May Chang wrote:
> gcc was generating a partial register update on x86, due to an xorb % 
> dl, %dl followed by using %edx shortly thereafter.
> Partial register update is an x86 performance hazard on certain  
> prominent x86 implementations.
[cut]
> 0000005d        xorb    %dl,%dl                             <---  
> modifies %dl to clear the byte.

> Instead of a better implementation:
> 
> 000000f1        andl    $0xffffff00,%edx

   Do we know that the CPU isn't making this sort of transformation itself?

> +(define_insn "*movstrictqi_and"
> +  [(set (strict_low_part (match_operand:QI 0 "q_regs_operand" "+q"))

   I think you want "nonimmediate_operand" "+rm" here...

> +  if (TARGET_64BIT) 
> +    return "and{q}\t{$0xffffffffffffff00, %q0}";
> +  else
> +    return "and{l}\t{$0xffffff00, %k0}";

because these instructions work on more operands than just QImode capable
registers.

> +  "reload_completed && (!TARGET_USE_MOV0 && !optimize_size)"

   With -march=i386, !TARGET_USE_MOV0 is true. So some poor soul who wants
code optimized for a 386sx will likely see a performance regression here:

  xorb	%dl, %dl	# 2 bytes, 1 cycle fetch, 2 cycle execute.
  andl	$-256, %edx	# 6 bytes, 3 cycle fetch, 2 cycle execute.

   Fewer instructions will fit in the cache, and they'll take longer to load
into the cache. Please limit your insn pattern to CPUs where it increases
performance.

-- 
Rask Ingemann Lambertsen


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]