[Bug target/93039] Fails to use SSE bitwise ops for float-as-int manipulations

rguenther at suse dot de gcc-bugzilla@gcc.gnu.org
Wed Jan 8 16:09:00 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On January 8, 2020 4:34:40 PM GMT+01:00, "amonakov at gcc dot gnu.org"
<gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
>
>--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
>> The question is for which CPUs is it actually faster to use SSE?
>
>In the context of chains where the source and the destination need to
>be SSE
>registers, pretty much all CPUs? Inter-unit moves typically have some
>latency,
>e.g. recent AMD (since Zen) and Intel (Skylake) have latency 3 for
>sse<->gpr
>moves (surprisingly though four generations prior to Skylake had
>latency 1).
>Older AMDs with shared fpu had even worse latencies. At the same time
>SSE
>integer ops have comparable latencies and throughput to gpr ones, so
>generally
>moving a chain to SSE ops isn't making it slower. Plus it helps with
>register
>pressure.
>
>When either the source or the destination of a chain is bound to a
>general
>register or memory, it's ok to continue doing it on general regs.

But we need an extra load for the constant operand with an SSE op.


More information about the Gcc-bugs mailing list