[PATCH] x86_64: Integer min/max improvements.
Fri Aug 7 13:21:07 GMT 2020
On Thu, Aug 6, 2020 at 10:40 AM Roger Sayle <email@example.com> wrote:
> Hi Uros,
> Many thanks for the review and feedback. Here's the final version as committed,
> with both the test cases requested by Richard Biener and your suggestion/request
> to use ix86_expand_clear. Tested again on x86_64-pc-linux-gnu.
> Thank you again for the fantastic ix86_expand_clear pointer, which cleared up one
> of two technical questions I had, and allowed this peephole2 to now also apply to
> QImode and HImode MOV0s, where my original version was limited to SImode and
> My two questions were (i) why a QImode set of 0 with a flags clobber isn't a recognized
> instruction? I'd assume that on some architectures "xorb dl,dl" might be an appropriate
> sequence to use. This is mostly answered by the use of ix86_expand_clear, which
> intelligently selects the correct form, but the lack of a *movqi_xor was previously odd.
XOR transformation is used mostly due to code size, where we have:
0: b0 00 mov $0x0,%al
2: 30 c0 xor %al,%al
4: bb 00 00 00 00 mov $0x0,%ebx
9: 31 db xor %ebx,%ebx
So, as can be seen from the above example, there is no benefit for
QImode, where 3 bytes can be saved for SImode.
> (ii) My other question, was that despite my best efforts I couldn't seem to convince GCC
> to generate/use a *movsi_or to load the constant -1. It was just a curiosity, but this
> would affect/benefit the smaxm1 and sminm1 examples in the new i386/minmax-10.c
This transformation is enabled only for -Os or
DEF_TUNE (X86_TUNE_MOVE_M1_VIA_OR, "move_m1_via_or", m_PENT | m_LAKEMONT)
However, clearing the register with xor reg,reg also prevents partial
reg stall, where or -1, reg does not.
More information about the Gcc-patches