This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: generic and i386 bswap improvements


Hello Richard!

(3) Implement bswapsi for 80386, which doesn't have the bswap
    instruction.  For this we generate

	xchgb	%ch, %cl
	roll	$16, $ecx
	xchgb	%ch, %cl

According to pentium optimization guide, this is a win only for pentium4 (1.5clk vs 4clk), other targets should use rolw $8, $cx or (rorw $8, $cx) instead of xchgb.

Perhaps we should generate rolw as default (it also operates on
registers, other than Q) and split it after reload into xchgb when
appropriate?

Attahced to this message, please find a patch (diffed to a couple of
days old mainline!) that implements the second part of above
suggestion. Due to the granularity of rdtsc, I was not able to measure
any runtime difference on pentium4, but it is clearly a code size win.

2007-02-14 Uros Bizjak <ubizjak@gmail.com>

       * config/i386/i386.h (x86_use_xchgb): New.
       (TARGET_USE_XCHGB): New macro.
       * config/i386/i386.c (x86_use_xchgb): Set for PENT4.

	* config/i386/i386.md (*rotlhi3_1, *rotrhi3_1): For TARGET_USE_XCHGB
	or when optimizing for size, split into bswaphi after reload for shifts of 8.
	(*bswaphi): New insn pattern.

Uros.

Attachment: i386-xchgb.diff
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]