[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication
pinskia at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Dec 12 09:39:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438
--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #9)
> I've looked into another case where inability to handle stores with gaps
> generates sub-optimal code. I'm interested in spending some time on fixing
> this, provided some guidance in the vectorizer.
>
> Is it substantially more difficult to handle stores with gaps compared to
> loads with gaps?
>
> The following is [minimally] reduced from 462.libquantum:quantum_sigma_x(),
> which is #2 function in 462.libquantum profile. This cycle accounts for
> about 25% of total 462.libquantum time.
>
> ===struct node_struct
> {
> float _Complex gap;
> unsigned long long state;
> };
>
> struct reg_struct
> {
> int size;
> struct node_struct *node;
> };
>
> void
> func(int target, struct reg_struct *reg)
> {
> int i;
>
> for(i=0; i<reg->size; i++)
> reg->node[i].state ^= ((unsigned long long) 1 << target);
> }
> ===
>
> This loop vectorizes into
> <bb 5>:
> # vectp.8_39 = PHI <vectp.8_40(6), vectp.9_38(4)>
> vect_array.10 = LOAD_LANES (MEM[(long long unsigned int *)vectp.8_39]);
> vect__5.11_41 = vect_array.10[0];
> vect__5.12_42 = vect_array.10[1];
> vect__7.13_44 = vect__5.11_41 ^ vect_cst__43;
> _48 = BIT_FIELD_REF <vect__7.13_44, 64, 0>;
> MEM[(long long unsigned int *)ivtmp_45] = _48;
> ivtmp_50 = ivtmp_45 + 16;
> _51 = BIT_FIELD_REF <vect__7.13_44, 64, 64>;
> MEM[(long long unsigned int *)ivtmp_50] = _51;
>
> which then becomes for aarch64:
> .L4:
> ld2 {v0.2d - v1.2d}, [x1]
> add w2, w2, 1
> cmp w2, w7
> eor v0.16b, v2.16b, v0.16b
> umov x4, v0.d[1]
> st1 {v0.d}[0], [x1]
> add x1, x1, 32
> str x4, [x1, -16]
> bcc .L4
What I did for thunderx was create a vector cost model which caused this loop
not be vectorized to get the regression from happening. Not this might
actually be better code for some micro arch. I need to check with the new
processor we have in house but that is next week or so. I don't know how much
I can share next week though.
More information about the Gcc-bugs
mailing list