This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/53687] _mm_cmpistri generates redundant movslq %ecx,%rcx on x86-64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53687

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #1 from Peter Cordes <peter at cordes dot ca> ---
This behaviour is pretty understandable.  gcc doesn't know that the
return-value range is only 0-16, i.e. guaranteed non-negative integers.  Since
you used a signed int offset, makes sense that it *sign* extends from 32 to 64.

If you use  unsigned offset, the missed-optimization becomes more obvious. 
gcc7.2 still uses a  movl    %ecx, %ecx  to zero-extend into rcx.

https://godbolt.org/g/wWvqpa

(Incidentally, same,same is the worst possible choice of registers for Intel
CPUs.  It means the mov can never be eliminated in the rename stage, and always
needs an execution port with non-zero latency.)

Even uintptr_t offset doesn't avoid it, because then the conversion from the
intrinsic to the variable results in sign-extension up to 64-bit.  It treats it
exactly like a function that returns int, which in the SysV ABI is allowed to
have garbage in the upper32.


(BTW, this use of flags from inline asm is not guaranteed to be safe.  Nothing
stops the optimizer from doing the pointer-increment after the `pcmpistri`,
which would clobber flags.  You could do `pcmpistri` inside the asm and produce
a uintptr_t output operand, except that doesn't work with goto.  So really you
should write the whole loop in inline asm)


Or better, don't use inline asm at all: gcc can CSE _mm_cmpistri with
_mm_cmpistra, so you can just use the intrinsic twice to get multiple operands,
and it will compile to a single instruction.  This is like using `/` and `%`
operators to get both results of a `div`.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]