59578 – Overuse of v prefix for SSE instructions

Bug 59578 - Overuse of v prefix for SSE instructions

Summary: Overuse of v prefix for SSE instructions

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.8.2

Importance:	P3 minor
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-12-22 16:50 UTC by Mike Sharov
Modified:	2013-12-22 20:23 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Sharov 2013-12-22 16:50:38 UTC

typedef float v16sf __attribute__((vector_size(16)));
v16sf f (v16sf x)
{ return (__builtin_ia32_shufps (x, x, 0xff)); }

Compiled on a Haswell 4770 with -march=native -O emits:

vshufps $255, %xmm0, %xmm0, %xmm0

Even though all registers are the same and

shufps $255, %xmm0, %xmm0

would have worked just as well without the extra byte for the v prefix.
This happens with other __builtin instructions as well. For example:

typedef long long v16so __attribute__((vector_size(16)));
v16so k (v16so x)
{ return (__builtin_ia32_aeskeygenassist128 (x, 1)); }

Emits vaeskeygenassist even though no memory accesses are present.

Comment 1 Jakub Jelinek 2013-12-22 20:23:53 UTC

That is intentional, please read something about SSE to AVX transition penalties, e.g. http://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf