This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
- From: Paulo Matos <pmatos at broadcom dot com>
- To: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Thu, 30 Jan 2014 14:00:26 +0000
- Subject: Regression [v850,mep...]: sign_extend in loop breaks zero-overhead loop generation
- Authentication-results: sourceware.org; auth=none
Hello,
I am tracking a performance and size regression from 4.5.4 present in trunk.
Consider the following function:
==
extern short delayLength;
typedef int Sample;
extern Sample *temp_ptr;
extern Sample x;
void
foo (short blockSize)
{
short i;
unsigned short loopCount;
loopCount = (unsigned short) (blockSize + delayLength) % 8;
for (i = 0; i < loopCount; i++)
*temp_ptr++ = x ^ *temp_ptr++;
}
==
For v850, before the commit
commit e0ae2fe2a0bebe9de31e3d8eb4feace4909ef009
Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Fri May 20 19:32:30 2011 +0000
2011-05-20 Tom de Vries <tom@codesourcery.com>
PR target/45098
* tree-ssa-loop-ivopts.c: Include expmed.h.
(get_shiftadd_cost): New function.
(force_expr_to_var_cost): Declare forward. Use get_shiftadd_cost.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@173976 138bc75d-0d04-0410-961f-82ee72b054a4
gcc generated for -O2:
_foo:
movhi hi(_delayLength),r0,r10
ld.h lo(_delayLength)[r10],r10
add r10,r6
andi 7,r6,r10
be .L1
movhi hi(_temp_ptr),r0,r16
ld.w lo(_temp_ptr)[r16],r15
mov r10,r17
shl 2,r17
mov r15,r14
movhi hi(_x),r0,r13
mov r15,r10
add r17,r14
movea lo(_x),r13,r13
.L3:
ld.w 0[r10],r11
ld.w 0[r13],r12
xor r12,r11
st.w r11,0[r10]
add 4,r10
cmp r14,r10
bne .L3
mov r15,r10
add r17,r10
st.w r10,lo(_temp_ptr)[r16]
.L1:
jmp [r31]
After the commit it generates:
_foo:
movhi hi(_delayLength),r0,r10
ld.h lo(_delayLength)[r10],r16
add r16,r6
andi 7,r6,r16
be .L1
movhi hi(_temp_ptr),r0,r17
ld.w lo(_temp_ptr)[r17],r18
movhi hi(_x),r0,r14
mov r18,r11
mov r16,r15
mov 0,r10
movea lo(_x),r14,r14
.L3:
ld.w 0[r11],r12
ld.w 0[r14],r13
add 1,r10
xor r13,r12
shl 16,r10
st.w r12,0[r11]
sar 16,r10
add 4,r11
cmp r15,r10
bne .L3
shl 2,r16
add r18,r16
st.w r16,lo(_temp_ptr)[r17]
.L1:
jmp [r31]
The problem is inside the loop:
shl 16,r10
st.w r12,0[r11]
sar 16,r10
add 4,r11
cmp r15,r10
shl followed by sar is used to sign extend r10 which was in previous gcc versions not being done and it is unnecessary.
At the point of commit v850 didn't have e3v5 support or zero overhead loops but now it does and this blocks generation of zero overhead loops. (with trunk and -mv850e3v5, gcc generates a sxh instruction instead of the shift pattern but the point is the same).
For mep the situation repeats. mep generates:
foo:
# frame: 8 8 regs
lh $10, %sdaoff(delayLength)($gp)
add $sp, -8
add3 $1, $1, $10
and3 $10, $1, 0x7
beqz $10, .L1
lw $2, %sdaoff(temp_ptr)($gp)
mov $1, 0
add3 $11, $gp, %sdaoff(x)
bra .L5
.L3:
mov $2, $9
.L5:
lw $0, ($11)
lw $3, 4($2)
add $1, 1
exth $1
xor $3, $0
slt3 $0, $1, $10
add3 $9, $2, 8
sw $3, ($2)
bnez $0, .L3
sw $9, %sdaoff(temp_ptr)($gp)
.L1:
add $sp, 8
ret
Again exth signextends $1 and blocks generation of zero overhead loop because suddenly loop is not simple anymore. Unfortunately I cannot test mep before the patch as at the time mep was not in mainline.
Does anyone understand why the mentioned patch is forcing the generation of the sign extend inside the loop? Is this just a problem with cost calculation in the backends or some issue lurking in tree-ssa-loop-ivopts?
Thanks,
Paulo Matos