ICC 18 is able to vectorize this loop, while GCC 8 is not. #include <vector> std::size_t f(std::vector<std::vector<float>> const & v) { std::size_t ret = 0; for (std::size_t i = 0; i < v.size(); ++i) ret += v[i].size(); return ret; }
Indeed, this example was mentioned during the discussion on better diagnostics but not entered in bugzilla, thanks. IIRC the issue is that we do not handle EXACT_DIV_EXPR in the vectorizer, which should be easy enough. (then it isn't obvious to me that vectorizing this particular loop is a good idea, but that's an independent question)
But even with that we seem to need AVX512F to vectorize it, with AVX2 we get t.C:5:31: note: not vectorized: relevant stmt not supported: patt_45 = patt_44 >> 2; thus, somehow V2DI arithmetic right shifts are not available. Indeed it looks like we only have named patterns for V4SI arithmetic right shifts for AVX2. I'm going to bootstrap / test the vectorizer fix.
ICC seems to emulate this even for SSE2 where I'm not sure this is profitable: ..B1.2: # Preds ..B1.2 ..B1.1 # Execution count [1.02e+03] movdqu .L_2il0floatpacket.0(%rip), %xmm2 #6.19 lea x(,%rax,8), %rdx #6.12 movdqu (%rdx), %xmm1 #6.12 movdqa %xmm2, %xmm0 #6.19 pand %xmm1, %xmm0 #6.19 movdqa %xmm1, %xmm3 #6.19 psrlq $1, %xmm3 #6.19 psrad $1, %xmm0 #6.19 por %xmm0, %xmm3 #6.19 psrlq $62, %xmm3 #6.19 paddq %xmm1, %xmm3 #6.19 pand %xmm3, %xmm2 #6.19 psrlq $2, %xmm3 #6.19 psrad $2, %xmm2 #6.19 por %xmm2, %xmm3 #6.19 movdqu %xmm3, (%rdx) #6.5 addq $2, %rax #5.3 cmpq $1024, %rax #5.3 jb ..B1.2 # Prob 99% #5.3 and for AVX2: ..B1.2: # Preds ..B1.2 ..B1.1 # Execution count [1.02e+03] lea x(,%rax,8), %rdx #6.12 vmovdqu (%rdx), %ymm4 #6.12 vpsrlq $1, %ymm4, %ymm0 #6.19 vpsrad $1, %ymm4, %ymm1 #6.19 vpblendw $204, %ymm1, %ymm0, %ymm2 #6.19 vpsrlq $62, %ymm2, %ymm3 #6.19 vpaddq %ymm4, %ymm3, %ymm5 #6.19 vpsrlq $2, %ymm5, %ymm6 #6.19 vpsrad $2, %ymm5, %ymm7 #6.19 vpblendw $204, %ymm7, %ymm6, %ymm8 #6.19 vmovdqu %ymm8, (%rdx) #6.5 addq $4, %rax #5.3 cmpq $1024, %rax #5.3 jb ..B1.2 # Prob 99% #5.3 long x[1024]; void foo() { for (int i = 0; i < 1024; ++i) x[i] = x[i] / 4; }
Author: rguenth Date: Wed Jul 18 12:57:15 2018 New Revision: 262854 URL: https://gcc.gnu.org/viewcvs?rev=262854&root=gcc&view=rev Log: 2018-07-18 Richard Biener <rguenther@suse.de> PR tree-optimization/86557 * tree-vect-patterns.c (vect_recog_divmod_pattern): Also handle EXACT_DIV_EXPR. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-patterns.c
target part remains
Fixed with the patch which fixes PR 101611.