This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: recent troubles with float vectors & bitwise ops


tbp wrote:
On 8/23/07, Tim Prince <tprince@computer.org> wrote:
Note that icc9 has a strong bias for pentium4, which had no stall
penalty for mistyped fp vectors as for Intel it came with the pentium
M line, so you see a pxor even if generating code for the core2.
# cat autoicc.cc
float foo(const float *a, int n) {
float sum = 0.f;
for (int i = 0; i <n; ++i)
if (a[i] > 0.f)
sum += a[i];
return sum;
}
int main() { return 0; }
# /opt/intel/cce/9.1.051/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3) : (col. 2) remark: LOOP WAS VECTORIZED.
4007a9: pxor %xmm4,%xmm4
4007ad: cmpltps %xmm3,%xmm4
4007b1: andps %xmm3,%xmm4
# /opt/intel/cce/10.0.023/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3): (col. 2) remark: LOOP WAS VECTORIZED.
400b50: xorps %xmm3,%xmm3
400b53: cmpltps %xmm4,%xmm3
400b57: andps %xmm3,%xmm4
For what little it's worth, I found no measurable difference between these choices on Core 2 Duo. People I know prefer to use the Intel -xW option, or the gcc default, in the absence of clear evidence that another option could improve performance without reducing the range of supported targets.

---AV & Spam Filtering by M+Guardian - Risk Free Email (TM)---


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]