This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/63271] New: Should commute arithmetic with vector load
- From: "zackw at panix dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 15 Sep 2014 18:53:30 +0000
- Subject: [Bug tree-optimization/63271] New: Should commute arithmetic with vector load
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271
Bug ID: 63271
Summary: Should commute arithmetic with vector load
Product: gcc
Version: 4.9.1
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: zackw at panix dot com
Consider
#include <emmintrin.h>
__m128i foo(char C)
{
return _mm_set_epi8( 0, C, 2*C, 3*C,
4*C, 5*C, 6*C, 7*C,
8*C, 9*C, 10*C, 11*C,
12*C, 13*C, 14*C, 15*C);
}
__m128i bar(char C)
{
__m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7,
8, 9,10,11,12,13,14,15);
v *= C;
return v;
}
I *believe* these functions compute the same value, and should therefore
generate identical code, but with gcc 4.9 foo() generates considerably larger
and slower code.
The test case is expressed in terms of x86 <emmintrin.h> but I have no reason
to believe it isn't a generic missed optimization in the tree-level vectorizer.