This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008

--- Comment #12 from sergey.shalnov at intel dot com ---
Richard,
Your last proposal changed the code generated a bit.
Currently is shows:
test_bugzilla1.c:6:5: note: Cost model analysis:. 
  Vector inside of loop cost: 62576 
  Vector prologue cost: 0 
  Vector epilogue cost: 0 
  Scalar iteration cost: 328 
  Scalar outside cost: 0 
  Vector outside cost: 0 
  prologue iterations: 0 
  epilogue iterations: 0 
test_bugzilla1.c:6:5: note: cost model: the vector iteration cost = 62576
divided by the scalar iteration cost = 328 is greater or equal to the
vectorization factor = 4.
test_bugzilla1.c:6:5: note: not vectorized: vectorization not profitable.
test_bugzilla1.c:6:5: note: not vectorized: vector version will never be
profitable.

And it uses xmm+ vpbroadcastd to spill tmp[] to stack
...
1e7:   62 d2 7d 08 7c c9       vpbroadcastd %r9d,%xmm1
 1ed:   c4 c1 79 7e c9          vmovd  %xmm1,%r9d
 1f2:   62 f1 fd 08 7f 8c 24    vmovdqa64 %xmm1,-0x38(%rsp)
 1f9:   c8 ff ff ff 
 1fd:   62 f2 7d 08 7c d7       vpbroadcastd %edi,%xmm2
 203:   c5 f9 7e d7             vmovd  %xmm2,%edi
 207:   62 f1 fd 08 7f 94 24    vmovdqa64 %xmm2,-0x28(%rsp)
 20e:   d8 ff ff ff 
 212:   62 f2 7d 08 7c db       vpbroadcastd %ebx,%xmm3
 218:   c5 f9 7e de             vmovd  %xmm3,%esi
 21c:   62 f1 fd 08 7f 9c 24    vmovdqa64 %xmm3,-0x18(%rsp)
 223:   e8 ff ff ff 
 227:   01 fe                   add    %edi,%esi
 229:   45 01 c8                add    %r9d,%r8d
 22c:   41 01 f0                add    %esi,%r8d
 22f:   8b 5c 24 dc             mov    -0x24(%rsp),%ebx
 233:   03 5c 24 ec             add    -0x14(%rsp),%ebx
 237:   8b 6c 24 bc             mov    -0x44(%rsp),%ebp
 23b:   03 6c 24 cc             add    -0x34(%rsp),%ebp
...

I think this is better in case of performance perspective but, as I said
before, not using vector registers here is the best option if no loops
vectorized.

In case of static loop increment (the first test case) - the first loop
vectorized as before.

Sergey

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]