This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?
- From: "sergey.shalnov at intel dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 15 Dec 2017 10:22:38 +0000
- Subject: [Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?
- Auto-submitted: auto-generated
- References: <bug-83008-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #14 from sergey.shalnov at intel dot com ---
" we have a basic-block vectorizer. Do you propose to remove it? "
Definitely not! SLP vectorizer is very good to have!
“What's the rationale for not using vector registers”
I just tried " -fno-tree-slp-vectorize" option and found the performance gain
for different -march= options.
I see some misunderstanding here. Let me clarify the original question with
–march=znver1.
I use " -Ofast -mfpmath=sse -funroll-loops -march=znver1" options set for
experiments.
For the basic block we are discussing we have (in vect_analyze_slp_cost() in
tree-vect-slp.c:1897):
tmp[i_220][0] = _150;
tmp[i_220][2] = _147;
tmp[i_220][1] = _144;
tmp[i_220][3] = _141;
tmp[i_139][0] = _447;
tmp[i_139][2] = _450;
tmp[i_139][1] = _453;
tmp[i_139][3] = _456;
tmp[i_458][0] = _54;
tmp[i_458][2] = _56;
tmp[i_458][1] = _58;
tmp[i_458][3] = _60;
this is si->stmt printed in the loop with "vect_prologue" calculation.
I see SLP statistic related to this BB:
note: Cost model analysis:.
Vector inside of basic block cost: 64
Vector prologue cost: 32
Vector epilogue cost: 0
Scalar cost of basic block: 256
note: Basic block will be vectorized using SLP
I see 12 statements that are calculated into 3 vector instructions with 4 data
type each (4*int->xmm)
group_size = 12
ncopies_for_cost = 3
nunits = 4
But I see "count" is 1 in cost vector related to prolog.
prologue_cost_vec = {m_vec = 0x3fc6e70 = {{count = 1, kind = vec_construct,
stmt = <gimple_assign 0x7f5b93b73370>, misalign = 0}}}
body_cost_vec = {m_vec = 0x3fc6f70 = {{count = 3, kind = vector_store, stmt =
<gimple_assign 0x7f5b93b73370>, misalign = 0}}}
Please correct me if I wrong but I think we have to have count=3 in
prologue_cost_vec.
And this could slightly change costs for "Vector prologue cost" and might have
an influence to vectorizer decision.
Sergey
PS
Richard,
I didn't catch your idea in " but DOM isn't powerful enough " sentence.
Could you please slightly clarify it?
Thank you.