This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
PowerPC suboptimal "add with carry" optimization
- From: Joakim Tjernlund <joakim dot tjernlund at transmode dot se>
- To: gcc at gcc dot gnu dot org
- Date: Sun, 25 Apr 2010 14:57:08 +0200
- Subject: PowerPC suboptimal "add with carry" optimization
Noticed that gcc 4.3.4 doesn't optimize "add with carry" properly:
static u32
add32carry(u32 sum, u32 x)
{
u32 z = sum + x;
if (sum + x < x)
z++;
return z;
}
Becomes:
add32carry:
add 3,3,4
subfc 0,4,3
subfe 0,0,0
subfc 0,0,3
mr 3,0
Instead of:
addc 3,3,4
addze 3,3
This slows down the the Internet checksum sigificantly
Also, doing this in a loop can be further optimized:
for(;len; --len)
sum = add32carry(sum, *++buf);
addic 3, 3, 0 /* clear carry */
.L31:
lwzu 0,4(9)
adde 3, 3, 0 /* add with carry */
bdnz .L31
addze 3, 3 /* add in final carry */