This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
I believe every __builtin_shuffle that can be done in a single instruction is already properly expanded on x86. For this 16 byte vector shuffle, it uses pshufb. Is there a better instruction?
The shuffle operations need a memory or [xy]mm register parameter. That's expensive to set up. The shuffling which doesn't rearrange bytes but just rotates them should use the equivalent of the
_mm_srli_si128
and
_mm_slli_si128
intrinsics.
From your post it seems that psrldq+pslldq+por should always be preferred,even if in some cases it can be a loss. That makes sense, I am just more used to "obvious" optimizations. Is there a way to decide when a load becomes worth it, compared to a large number of pure logical instructions?
-- Marc Glisse
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |