We run these devices in wavefrontsize64 for compatibility, but they actually
only have 32-lane vectors, natively. If the upper part of a V64 is masked
off (as it is in V32) then RDNA devices will skip execution of the upper part
for most operations, so this adjustment shouldn't leave too much performance on
the table. One exception is memory instructions, so full wavefrontsize32
support would be better.
The advantage is that we avoid the missing V64 operations (such as permute and
vec_extract).
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode): Prefer V32 on
RDNA devices.