[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

Tue Apr 21 07:07:18 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
                 CC|                            |uros at gcc dot gnu.org

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Joel Yliluoma from comment #11)
> Looks like this issue has taken a step or two *backwards* in the past years.
> 
> Where as the second function used to be vectorized properly, today it seems
> neither of them are.

Which version do you see vectorizing the second (add2) function?

> Contrast this with Clang, which compiles *both* functions into a single
> instruction:
> 
>   vaddps xmm0, xmm1, xmm0
> 
> or some variant thereof depending on the -m options.
> 
> Compiler Explorer link: https://godbolt.org/z/2AKhnt

The main issues on the GCC side are
  a) ABI details not exposed at the point of vectorization (several PRs about
     this exist)
  b) "Poor" support for two-element float vectors (an understatement, we have
     some support for MMX but that's integer only, but I'm not sure we've
     enabled the 3dnow part to be emulated with SSE)

oddly enough even with -mmmx -m3dnow I see add2 lowered by veclower so
the vector type or the vector add must be unsupported(?).

llvm is known to support emulating smaller vectors just fine (and by
design is also aware of ABI details).

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations