The issue is related to missing support for __builtin_bswap32: t1.c:9:3: note: function is not vectorizable. t1.c:9:3: note: not vectorized: relevant stmt not supported: _13 = __builtin_bswap32 (load_dst_8); Simple reproducer is attached.
Created attachment 39821 [details] test-case to reproduce It is sufficient to compiler it with -Ofast option on x86 platform.
Should be relatively easy to handle with a VIEW_CONVERT, VEC_PERM_EXPR, VIEW_CONVERT sequence.
Created attachment 39827 [details] untested patch Mostly untested prototype. For -mavx2 we get from the testcase innermost loop .L6: vmovdqa (%r9,%rdx), %ymm0 addl $1, %r8d vperm2i128 $0, %ymm0, %ymm0, %ymm0 vpshufb %ymm1, %ymm0, %ymm0 vmovdqa %ymm0, (%r9,%rdx) addq $32, %rdx cmpl %r11d, %r8d jb .L6 with -msse4: .L6: movdqa (%rax,%rdx), %xmm0 addl $1, %r8d pshufb %xmm1, %xmm0 movaps %xmm0, (%rax,%rdx) addq $16, %rdx cmpl %r10d, %r8d jb .L6 not sure if I got the bswap permutation vector constant correct either ;) (quick hack) vect_load_dst_8.13_63 = MEM[(u32 *)vectp_b.11_61]; load_dst_8 = *_3; _64 = VIEW_CONVERT_EXPR<vector(16) char>(vect_load_dst_8.13_63); _65 = VEC_PERM_EXPR <_64, _64, { 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0 }>; _66 = VIEW_CONVERT_EXPR<vector(4) unsigned int>(_65); _13 = __builtin_bswap32 (load_dst_8); MEM[(u32 *)vectp_b.14_69] = _66;
Probably handling should be moved after targetm.vectorize.builtin_vectorized_function handling to allow arms builtin-bswap vectorization via vrev to apply (not sure if its permutation handling selects vrev for a bswap permutation).
Created attachment 39990 [details] patch I am testing
Author: rguenth Date: Wed Nov 9 08:19:05 2016 New Revision: 241992 URL: https://gcc.gnu.org/viewcvs?rev=241992&root=gcc&view=rev Log: 2016-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/78007 * tree-vect-stmts.c (vectorizable_bswap): New function. (vectorizable_call): Call vectorizable_bswap for BUILT_IN_BSWAP{16,32,64} if arguments are not promoted. * gcc.dg/vect/vect-bswap32.c: Adjust. * gcc.dg/vect/vect-bswap64.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect-bswap32.c trunk/gcc/testsuite/gcc.dg/vect/vect-bswap64.c trunk/gcc/tree-vect-stmts.c
Fixed.