This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
(3) Implement bswapsi for 80386, which doesn't have the bswap instruction. For this we generate
xchgb %ch, %cl roll $16, $ecx xchgb %ch, %cl
According to pentium optimization guide, this is a win only for pentium4 (1.5clk vs 4clk), other targets should use rolw $8, $cx or (rorw $8, $cx) instead of xchgb.
Perhaps we should generate rolw as default (it also operates on registers, other than Q) and split it after reload into xchgb when appropriate?
Attahced to this message, please find a patch (diffed to a couple of days old mainline!) that implements the second part of above suggestion. Due to the granularity of rdtsc, I was not able to measure any runtime difference on pentium4, but it is clearly a code size win.
* config/i386/i386.h (x86_use_xchgb): New. (TARGET_USE_XCHGB): New macro. * config/i386/i386.c (x86_use_xchgb): Set for PENT4.
* config/i386/i386.md (*rotlhi3_1, *rotrhi3_1): For TARGET_USE_XCHGB or when optimizing for size, split into bswaphi after reload for shifts of 8. (*bswaphi): New insn pattern.
Attachment:
i386-xchgb.diff
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |