This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] partial register update for a bit mask operation on x86
So, what's the size difference between this and movstrctqi_xor?
2 bytes for xor or mov, versus 5/6 for and.
I think the patch is not doing the right thing anyway.
First, movstrictqi_xor should have a condition like this:
"reload_completed
&& ((!TARGET_PARTIAL_REG_STALL && !TARGET_USE_MOV0) || optimize_size)"
and this would probably remove the need for a new pattern altogether.
movstrictqi's should not be generated at all for partial register stall
targets, there should be no need to work around and generate the
standard bit twiddling. (BTW, could any guru enlighten me on the need
for "reload_completed" in the insn's condition?)
Second, movstrictqi_1 should have a "i" alternative for the source
operand, to allow other constants. I am pretty sure the current status
may cause problems on the K6 (i.e. the only TARGET_USE_MOV0 target), and
adding the alternative also allows more optimization for -Os (or
processors without register stalls). I am pretty sure that combine can
synthesize a movstrictqi_1 from "x |= 255;", and even for "x = (x &
~255) | 100".
If this is done, movstrictqi_xor should be moved in front of
movstrictqi_1, or eliminated completely since it does not have any size
benefit.
Third, the same should be done for movstricthi, except that in this case
movstricthi_xor should be kept because it *does* have size benefits (2
bytes for xor, 3 for mov, 5/6 for and).
/* { dg-do compiler { { target i?86-*-* x86_64-*-* } && ilp32 } } */
"dg-do compile", of course. :-)
I would also add a test that tries "-mtune=pentium" and check that the
"xor" version is generated.
Hope this helps!
Paolo