This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/70976] New: Useless vectorization leads to degradation of performance
- From: "b7.10110111 at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 06 May 2016 11:47:01 +0000
- Subject: [Bug rtl-optimization/70976] New: Useless vectorization leads to degradation of performance
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70976
Bug ID: 70976
Summary: Useless vectorization leads to degradation of
performance
Product: gcc
Version: 6.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: b7.10110111 at gmail dot com
Target Milestone: ---
See the following code:
#include <stdio.h>
int main()
{
unsigned long u = 13;
for(unsigned long i = 0; i < 1UL<<30; i++)
u += 23442*u;
if (u == 0) printf("0\n");
}
Compiling it on an AMD64 system with -O2, I get normal assembly for the loop:
.L2:
imul rdx, rdx, 23443
sub rax, 1
jne .L2
But if I use -O3, the loop looks like this:
.L2:
movdqa xmm3, xmm1
add eax, 1
movdqa xmm0, xmm1
pmuludq xmm1, xmm4
cmp eax, 536870912
pmuludq xmm3, xmm2
psrlq xmm0, 32
pmuludq xmm0, xmm2
paddq xmm0, xmm1
movdqa xmm1, xmm3
psllq xmm0, 32
paddq xmm1, xmm0
jne .L2
Not only does it become longer, but also it needlessly does calculations on
pairs of identical numbers. On my CPU (Intel(R) Xeon(R) CPU E3-1226 v3 @
3.30GHz) the -O2 version is almost two times faster than -O3 one.
This happens with gcc 4.7.3 and newer, but doesn't with 4.6.4 and older.