This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

From: "sergey.shalnov at intel dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Fri, 15 Dec 2017 10:22:38 +0000
Subject: [Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?
Auto-submitted: auto-generated
References: <bug-83008-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008

--- Comment #14 from sergey.shalnov at intel dot com ---
" we have a basic-block vectorizer.  Do you propose to remove it? "
Definitely not! SLP vectorizer is very good to have!

“What's the rationale for not using vector registers”
I just tried " -fno-tree-slp-vectorize" option and found the performance gain
for different -march= options.

I see some misunderstanding here. Let me clarify the original question with
–march=znver1.
I use " -Ofast -mfpmath=sse -funroll-loops -march=znver1" options set for
experiments.

For the basic block we are discussing we have (in vect_analyze_slp_cost() in
tree-vect-slp.c:1897):

tmp[i_220][0] = _150;
tmp[i_220][2] = _147;
tmp[i_220][1] = _144;
tmp[i_220][3] = _141;

tmp[i_139][0] = _447;
tmp[i_139][2] = _450;
tmp[i_139][1] = _453;
tmp[i_139][3] = _456;

tmp[i_458][0] = _54;
tmp[i_458][2] = _56;
tmp[i_458][1] = _58;
tmp[i_458][3] = _60;

this is si->stmt printed in the loop with "vect_prologue" calculation.

I see SLP statistic related to this BB:
note: Cost model analysis:. 
  Vector inside of basic block cost: 64 
  Vector prologue cost: 32 
  Vector epilogue cost: 0 
  Scalar cost of basic block: 256 
note: Basic block will be vectorized using SLP

I see 12 statements that are calculated into 3 vector instructions with 4 data
type each (4*int->xmm)
group_size = 12
ncopies_for_cost = 3
nunits = 4

But I see "count" is 1 in cost vector related to prolog.
prologue_cost_vec = {m_vec = 0x3fc6e70 = {{count = 1, kind = vec_construct,
stmt = <gimple_assign 0x7f5b93b73370>, misalign = 0}}}
body_cost_vec = {m_vec = 0x3fc6f70 = {{count = 3, kind = vector_store, stmt =
<gimple_assign 0x7f5b93b73370>, misalign = 0}}}

Please correct me if I wrong but I think we have to have count=3 in
prologue_cost_vec.
And this could slightly change costs for "Vector prologue cost" and might have
an influence to vectorizer decision.

Sergey
PS
Richard,
I didn't catch your idea in " but DOM isn't powerful enough " sentence.
Could you please slightly clarify it?
Thank you.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]