This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/37678] New: Failure to generate post-increment addressing
- From: "TabonyEE at austin dot rr dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 30 Sep 2008 01:01:18 -0000
- Subject: [Bug middle-end/37678] New: Failure to generate post-increment addressing
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
This is an ia64 performance and code size regression. Consider the following
function:
void sump(int *a, int *b, int *c, int len){
int i;
for(i = 0; i < len; i++){
*a++ = *b++ + *c++;
}
}
GCC 3.4.6 generated the following code. Note that the memory accesses use the
post-increment addressing mode.
sump:
.prologue
.mii
nop 0
.save ar.lc, r2
mov r2 = ar.lc
.body
sxt4 r14 = r35
.mfb
cmp4.ge p6, p7 = 0, r35
nop 0
(p6) br.cond.dpnt .L16
;;
.mmi
adds r14 = -1, r14
;;
nop 0
mov ar.lc = r14
.L17:
.mmb
ld4 r14 = [r33], 4
ld4 r15 = [r34], 4
nop 0
;;
.mii
nop 0
add r14 = r14, r15
;;
nop 0
.mfb
st4 [r32] = r14, 4
nop 0
br.cloop.sptk.few .L17
;;
.L16:
.mib
nop 0
mov ar.lc = r2
br.ret.sptk.many b0
Mainline (revision 140763) generated the following code. Instead of using
post-increment addressing, the original pointers are kept in r34, r33, and r32,
while an offset, held in r14, is incremented by four each iteration. The
offset is added to each of the three bases each iteration. This pattern of
code is ideal for machines that have a base-plus-index addressing mode and not
a post-increment addressing mode, such as x86. However, ia64 has a
post-increment addressing mode and not a base-plus-index addressing mode.
sump:
.prologue
.body
.mmi
adds r18 = -1, r35
nop 0
mov r14 = r0
.mmb
cmp4.ge p6, p7 = 0, r35
nop 0
(p6) br.ret.dpnt.many rp
;;
.mmi
nop 0
addp4 r18 = r18, r0
nop 0
;;
.mii
adds r18 = 1, r18
nop 0
;;
shladd r18 = r18, 2, r0
.L10:
.mmi
add r17 = r34, r14
add r16 = r33, r14
add r15 = r32, r14
.mmi
adds r14 = 4, r14
;;
ld4 r17 = [r17]
nop 0
.mii
ld4 r16 = [r16]
cmp.ne p6, p7 = r18, r14
;;
add r16 = r17, r16
;;
.mib
st4 [r15] = r16
nop 0
(p6) br.cond.dptk .L10
.mib
nop 0
nop 0
br.ret.sptk.many rp
Here is the intermediate representation just before the ivopts pass (just after
the cunroll pass). Let's call this "IR1". Note that the three pointers are
incremented each iteration. This pattern, were it preserved, would later be
transformed into post-increment addressing by the auto-inc-dec pass.
;; Function sump (sump)
sump (int * a, int * b, int * c, int len)
{
int i;
int D.1279;
int D.1278;
int D.1277;
<bb 2>:
if (len_9(D) > 0)
goto <bb 3>;
else
goto <bb 6>;
<bb 3>:
<bb 4>:
# i_25 = PHI <i_16(5), 0(3)>
# a_26 = PHI <a_13(5), a_6(D)(3)>
# b_27 = PHI <b_14(5), b_7(D)(3)>
# c_28 = PHI <c_15(5), c_8(D)(3)>
D.1277_10 = *b_27;
D.1278_11 = *c_28;
D.1279_12 = D.1278_11 + D.1277_10;
*a_26 = D.1279_12;
a_13 = a_26 + 4;
b_14 = b_27 + 4;
c_15 = c_28 + 4;
i_16 = i_25 + 1;
if (len_9(D) > i_16)
goto <bb 5>;
else
goto <bb 6>;
<bb 5>:
goto <bb 4>;
<bb 6>:
return;
}
Here is the intermediate representation after the ivopts pass. Let's call this
"IR2". The code has been transformed into the pattern we see in the final
assembly.
;; Function sump (sump)
sump (int * a, int * b, int * c, int len)
{
unsigned int D.1361;
unsigned int D.1362;
long unsigned int D.1363;
long unsigned int D.1364;
long unsigned int D.1365;
int * D.1366;
int * D.1360;
long unsigned int D.1359;
int * D.1358;
long unsigned int D.1357;
int * D.1356;
long unsigned int D.1355;
int * ivtmp?59;
int i;
int D.1279;
int D.1278;
int D.1277;
<bb 2>:
if (len_9(D) > 0)
goto <bb 3>;
else
goto <bb 6>;
<bb 3>:
<bb 4>:
# ivtmp?59_1 = PHI <ivtmp?59_24(5), 0B(3)>
D.1355_20 = (long unsigned int) ivtmp?59_1;
D.1356_22 = b_7(D) + D.1355_20;
D.1277_10 = MEM[base: D.1356_22];
D.1357_23 = (long unsigned int) ivtmp?59_1;
D.1358_21 = c_8(D) + D.1357_23;
D.1278_11 = MEM[base: D.1358_21];
D.1279_12 = D.1278_11 + D.1277_10;
D.1359_30 = (long unsigned int) ivtmp?59_1;
D.1360_31 = a_6(D) + D.1359_30;
MEM[base: D.1360_31] = D.1279_12;
ivtmp?59_24 = ivtmp?59_1 + 4;
D.1361_32 = (unsigned int) len_9(D);
D.1362_33 = D.1361_32 + 4294967295;
D.1363_34 = (long unsigned int) D.1362_33;
D.1364_35 = D.1363_34 + 1;
D.1365_36 = D.1364_35 * 4;
D.1366_37 = (int *) D.1365_36;
if (ivtmp?59_24 != D.1366_37)
goto <bb 5>;
else
goto <bb 6>;
<bb 5>:
goto <bb 4>;
<bb 6>:
return;
}
Here is the RTL representation (with extraneous information removed) going into
the auto-inc-dec pass (coming out of the regclass pass). auto-inc-dec does not
recognize that this code can be transformed to use post-increment addressing.
The only thing auto-inc-dec does to this code is add some death notes.
(set (reg:DI 341 [ a ]) (reg:DI in0))
(set (reg:DI 342 [ b ]) (reg:DI in1))
(set (reg:DI 343 [ c ]) (reg:DI in2))
(set (reg:SI 344 [ len ]) (reg:SI in3))
(set (reg:BI 345)
(le:BI (reg:SI 344 [ len ])
(const_int 0)))
(set (pc)
(if_then_else (ne (reg:BI 345)
(const_int 0))
(label_ref 35)
(pc)))
(set (reg:DI 340 [ ivtmp?59 ]) (const_int 0))
(set (reg:SI 357)
(plus:SI (reg:SI 344 [ len ])
(const_int -1)))
(set (reg:DI 358)
(zero_extend:DI (reg:SI 357)))
(set (reg:DI 359)
(plus:DI (reg:DI 358)
(const_int 1)))
(set (reg:DI 360)
(ashift:DI (reg:DI 359)
(const_int 2)))
(code_label 23)
(set (reg:DI 346)
(plus:DI (reg:DI 341 [ a ])
(reg:DI 340 [ ivtmp?59 ])))
(set (reg:DI 347)
(plus:DI (reg:DI 343 [ c ])
(reg:DI 340 [ ivtmp?59 ])))
(set (reg:DI 348)
(plus:DI (reg:DI 342 [ b ])
(reg:DI 340 [ ivtmp?59 ])))
(set (reg:SI 349)
(mem:SI (reg:DI 347)))
(set (reg:SI 350)
(mem:SI (reg:DI 348)))
(set (reg:SI 351)
(plus:SI (reg:SI 349)
(reg:SI 350)))
(set (mem:SI (reg:DI 346))
(reg:SI 351))
(set (reg:DI 340 [ ivtmp?59 ])
(plus:DI (reg/f:DI 340 [ ivtmp?59 ])
(const_int 4)))
(set (reg:BI 356)
(ne:BI (reg:DI 340 [ ivtmp?59 ])
(reg:DI 360)))
(set (pc)
(if_then_else (ne (reg:BI 356)
(const_int 0))
(label_ref 23)
(pc)))
(code_label 35)
Interestingly, bfin, which also has a post-increment addressing mode and not a
base-plus-index addressing mode, generates post-increment addressing for this
same function. The IR looks like IR1 both before and after the ivopts pass.
Why does ivopts decide to transform the code when compiling for ia64 and not
when compiling for bfin?
Another interesting data point, consider the following function:
void suma(int a[], int b[], int c[], int len){
int i;
for(i = 0; i < len; i++){
a[i] = b[i] + c[i];
}
}
Both GCC 3.4.6 and mainline generate the base-plus-index-like code for both
ia64 and bfin. However, in GCC 3.4.6, there is an option named
"-freduce-all-givs". Using that option causes GCC 3.4.6 to generate
post-increment addressing for the suma function for both ia64 and bfin. The
transformation occurs in the RTL loop pass. The incoming code has three base
pointers and one offset. The output code has several copies of each pointer
that are each incremented, and one of each of those copies is used in the mem
expressions. The other copies of the incremented pointers and the original
index register are dead and are later removed.
>From the 3.4.6 code in loop.c:strength_reduce, it appears the -freduce-all-givs
option overrides a cost decision. Could that be the difference between ia64
and bfin in mainline? Is ivopts deciding that IR2 is cheaper on ia64 while IR1
is cheaper on bfin? If so, how can the ia64 cost model be changed to cause
ivopts to generate IR1?
Another interesting question is, is it possible for ivopts to transform the IR
for the suma function from IR2 into IR1? Is this also a problem with costs, or
does ivopts simply lack the ability to make that transformation?
Another possibility is that ivopts is doing what people believe it should, and
the problem should be addressed in auto-inc-dec, or elsewhere. Is that the
case? Would it be possible to teach auto-inc-dec to recognize that this
base-plus-offset-like RTL can be transformed into auto-increments? Is there
anywhere else this problem could be fixed?
--
Summary: Failure to generate post-increment addressing
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: TabonyEE at austin dot rr dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: ia64-unknown-elf
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37678