31849 – [4.3/4.4/4.5/4.6 Regression] Code size increased with PR 31360 (IV-opts not understanding autoincrement)

Bug 31849 - [4.3/4.4/4.5/4.6 Regression] Code size increased with PR 31360 (IV-opts not understanding autoincrement)

Summary: [4.3/4.4/4.5/4.6 Regression] Code size increased with PR 31360 (IV-opts not u...

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.3.0

Importance:	P2 normal
Target Milestone:	4.3.6
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Duplicates (1):	36135 (view as bug list)
Depends on:
Blocks:	16996 31360 39201
	Show dependency tree / graph

Reported:	2007-05-07 01:05 UTC by Richard Earnshaw
Modified:	2018-03-10 02:37 UTC (History)
CC List:	17 users (show)

See Also:	31241
Host:
Target:	arm-none-eabi
Build:
Known to work:
Known to fail:	4.3.0
Last reconfirmed:	2007-11-24 21:56:11

Attachments
source code showing regression (1.36 KB, text/plain) 2007-05-07 01:06 UTC, Richard Earnshaw	Details
Patch to make ivopts take autoincrement addressing modes into account (4.47 KB, patch) 2007-11-26 00:48 UTC, Zdenek Dvorak	Details \| Diff
Zoltan's test case (2.19 KB, text/x-csrc) 2008-07-15 19:33 UTC, Joel Sherrill	Details
patch to take POST_DEC and POST_MODIFY into account (5.98 KB, patch) 2009-03-06 15:54 UTC, Jorn Wolfgang Rennecke	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Earnshaw 2007-05-07 01:05:01 UTC

The fix to PR31360 has caused significant code size regressions on ARM-EABI.  An example of this is from zlib (adler32.c) and is attached, compile with -Os -mcpu=arm7tdmi -fno-short-enums -w 

The new code:
1) Hoists a register containing 0 out of the loop
2) Uses that *only* as a copy into another register (mov reg, #0 costs exactly the same as mov reg, reg)
3) By doing the above, somehow prevents the post-increment sequence from being found.
4) Forces another register to be saved by the function.

Comment 1 Richard Earnshaw 2007-05-07 01:06:36 UTC

Created attachment 13520 [details]
source code showing regression

compiled code for this file regresses by approximately 3%

Comment 2 Andrew Pinski 2007-05-07 02:36:11 UTC

> 1) Hoists a register containing 0 out of the loop
The correct thing to do.
>2) Uses that *only* as a copy into another register (mov reg, #0 costs exactly
the same as mov reg, reg)
Interesting, but this is also true on ppc where I reported the original problem.

> 3) By doing the above, somehow prevents the post-increment sequence from being
found.
This might be the true issue.  But is this a regression from 4.1 where loop.c would actually move this invariant?

Comment 3 Steven Bosscher 2007-05-07 04:14:13 UTC

Yup.  So the heuristic to just move everything out of the loop needs another tweak.  This is sort-of the reverse of the problem for Bug 31360.

Adding Mark to put this bug on his radar.

Comment 4 Richard Earnshaw 2007-05-07 09:34:48 UTC

(In reply to comment #2)
> > 1) Hoists a register containing 0 out of the loop
> The correct thing to do.
>
Not necessarily.  Hoisting literal constants means that opportunities to simply insns based on that constant are potentially lost.

Comment 5 rakdver@kam.mff.cuni.cz 2007-05-07 09:38:50 UTC

Subject: Re:  [4.2/4.3 Regression] Code size regression caused by fix to PR 31360

> (In reply to comment #2)
> > > 1) Hoists a register containing 0 out of the loop
> > The correct thing to do.
> >
> Not necessarily.  Hoisting literal constants means that opportunities to simply
> insns based on that constant are potentially lost.

This could be prevented if invariant motion added equivalence notes; nevertheless,
one could argue that such simplifications should have been performed
before loop invariant motion (in cse).

Comment 6 stevenb.gcc@gmail.com 2007-05-07 09:46:34 UTC

Subject: Re:  [4.2/4.3 Regression] Code size regression caused by fix to PR 31360

Constant / copy simplifications should be done in at least CSE,
fwprop, and the gcse CPROP passes (we run CPROP three times!).

Comment 7 Richard Earnshaw 2007-05-07 10:43:21 UTC

Here's another example of code that is now significantly worse (~20% larger).  Rather than incrementing the base pointers on each iteration of the loop, we now maintain both base pointers and and offset.  This costs two extra registers for no benefit at all.

char * strncat(char *dest, const char *src, unsigned count)
{
        char *tmp = dest;

        if (count) {
                while (*dest)
                        dest++;
                while ((*dest++ = *src++)) {
                        if (--count == 0) {
                                *dest = '\0';
                                break;
                        }
                }
        }

        return tmp;
}

Comment 8 Zdenek Dvorak 2007-05-07 11:22:34 UTC

(In reply to comment #7)
> Here's another example of code that is now significantly worse (~20% larger). 
> Rather than incrementing the base pointers on each iteration of the loop, we
> now maintain both base pointers and and offset.  This costs two extra registers
> for no benefit at all.

actually, this should save one addition (only the index is incremented, the additions of index to bases are done in the addressing mode).  Since we do not run out of registers, this seems to be the correct thing to do.

However, for some reason we do not currently eliminate the use of the final value of dest (which should be replaced by base_of_dest + count).

Comment 9 Richard Earnshaw 2007-05-07 13:50:07 UTC

(In reply to comment #8)
>
> actually, this should save one addition (only the index is incremented, the
> additions of index to bases are done in the addressing mode).  

When a machine has a post-increment instruction (as ARM does) the addition is free.  So really there should be no additions in this code (though the previous version had one, to the current two).

Comment 10 Andrew Pinski 2007-05-07 13:57:43 UTC

So this comes down to the orders of passes?  At least that is what is being said as far as I can tell (though maybe flow is just too stupid to pull back the increment and have it as being free).

Comment 11 Zdenek Dvorak 2007-05-07 14:25:29 UTC

(In reply to comment #10)
> So this comes down to the orders of passes?  At least that is what is being
> said as far as I can tell (though maybe flow is just too stupid to pull back
> the increment and have it as being free).

About what testcase are you speaking?  As for the first one, I really have no idea yet what is going on there (all that was written here are just hypothesis).

As for the second one, the problem is that ivopts do not know about autoincrements, so it transforms

  something (*src++);
  something (*dest++);
(which is better on autoinc-having architecture)

to

  something (src[i]);
  something (dest[i]);
  i++;

(which would be better if autoinc is not available).

At the moment, I have no intention to teach ivopts about autoinc (since ivopts need to be rewritten, anyway).

Comment 12 Mark Mitchell 2007-05-07 16:21:17 UTC

I am confused that this is marked as a 4.2 regression.  I thought the fix for PR 31360 had not gone on the 4.2 branch?

Thanks,

-- Mark

Comment 13 Mark Mitchell 2007-05-07 16:28:52 UTC

I don't understand why Andrew thinks hoisting the load of zero out of the loop is the correct thing to do (in Comment #2).  It doesn't make sense to hoist a load of a constant if loading the constant directly is as cheap as loading the constant from a register.  The PR31360 case is different in that the instruction which loaded the constant was redundant; the register into which it was being loaded was loop invariant.  So, moving the instruction out of the loop just saves an instruction per loop iteration.

Can we tell the difference between the two cases when considering whether or not to hoist the load?

Comment 14 rakdver@kam.mff.cuni.cz 2007-05-07 18:02:32 UTC

Subject: Re:  [4.2/4.3 Regression] Code size regression caused by fix to PR 31360

> I don't understand why Andrew thinks hoisting the load of zero out of the loop
> is the correct thing to do (in Comment #2).  It doesn't make sense to hoist a
> load of a constant if loading the constant directly is as cheap as loading the
> constant from a register.

it should be possible to use the register directly, not copy it's value to
another register.  So we need to find out why it does not happen, and
possibly avoid hoisting the load out of the loop.

Comment 15 Richard Earnshaw 2007-05-07 21:58:59 UTC

Yet another case that's regressed.  This time the compiler seems to have started doing loop peeling at -Os!
 
char *airStrtok();
void printf(char *, ...);
char *strtok();
#define NULL 0

int
main(int argc, char *argv[]) {
  char *s, *ct, *last, *ret;

  if (3 != argc) {
    /*             0   1    2    (3) */
    printf("usage: %s <s> <ct>\n", argv[0]);
    exit(1);
  }
  s = argv[1];
  ct = argv[2];
  printf("s = |%s|\n", s);
  printf("ct = |%s|\n", ct);
  ret = airStrtok(s, ct, &last);
  while (ret) {
    printf("|%s|\n", ret);
    ret = airStrtok(NULL, ct, &last);
  }
  printf("--------------\n");
  printf("hey, why doesn't this work?!?!?\n");
  printf("s = |%s|\n", s);
  ret = strtok(s, ct);
  while (ret) {
    printf("ret = |%s|\n", ret);
    ret = strtok(NULL, ct);
  }
  exit(0);
}

Comment 16 Andrew Pinski 2007-05-07 22:00:30 UTC

This is only a 4.3 regression as PR 31360 has not been applied there yet.

Comment 17 Richard Earnshaw 2007-05-07 22:04:26 UTC

Hmm, no, it's not as simple as that.  In the original code the compiler replaces the first call to airStrtok by jumping into the middle of the while loop.  In the new variant of the code this optimization is nolonger being found.

Comment 18 Zdenek Dvorak 2007-07-04 13:32:32 UTC

(In reply to comment #0)
> The fix to PR31360 has caused significant code size regressions on ARM-EABI. 
> An example of this is from zlib (adler32.c) and is attached, compile with -Os
> -mcpu=arm7tdmi -fno-short-enums -w 
> 
> The new code:
> 1) Hoists a register containing 0 out of the loop
> 2) Uses that *only* as a copy into another register (mov reg, #0 costs exactly
> the same as mov reg, reg)

Actually, rtx_cost claims that mov reg, #0 costs 20, while mov reg, reg costs 16.  Fixing this (assuming that is indeed wrong) will not fix the problem (without further changes to the invariant motion), but it is the first step.

Comment 19 Eric Botcazou 2007-11-24 18:43:10 UTC

The pessizimation actually happens at the tree level, -fivopts turns

<bb 9>:
  s1 = (long unsigned int) *buf + s1;
  buf = buf + 1;
  s2 = s1 + s2;
  k = k + -1;
  if (k != 0)
    goto <bb 9>;
  else
    goto <bb 10>;

into

<bb 10>:
  s1 = s1 + (long unsigned int) MEM[base: buf.91, index: D.1322];
  s2 = s1 + s2;
  D.1322 = D.1322 + 1;
  if (D.1322 != (unsigned int) k)
    goto <bb 10>;
  else
    goto <bb 11>;

in final_cleanup.


Couldn't ivopts be taught to recognize "dec and branch on zero" patterns

  k_114 = k_15 + -1;
  if (k_114 != 0)
    goto <bb 10>;
  else
    goto <bb 11>;

and take into account their breakage for its cost estimates somehow?

Comment 20 Zdenek Dvorak 2007-11-24 19:08:29 UTC

 
> Couldn't ivopts be taught to recognize "dec and branch on zero" patterns
> 
>   k_114 = k_15 + -1;
>   if (k_114 != 0)
>     goto <bb 10>;
>   else
>     goto <bb 11>;
> 
> and take into account their breakage for its cost estimates somehow?

Why?  If using dec and branch is profitable, doloop pass should do it; the transformation that ivopts do does not prevent that or make it any more difficult.

Comment 21 Eric Botcazou 2007-11-24 19:19:38 UTC

> Why?  If using dec and branch is profitable, doloop pass should do it; the
> transformation that ivopts do does not prevent that or make it any more
> difficult.

I'm not sure that blindly doing transformations in the hope that subsequent
ones will repair the potential damages is always the way to go.  Why not
avoid causing the damages in some cases?  Is that really too "old-style"?

Comment 22 rakdver@kam.mff.cuni.cz 2007-11-24 21:52:55 UTC

Subject: Re:  [4.3 Regression] Code size regression caused by fix to PR 31360

> > Why?  If using dec and branch is profitable, doloop pass should do it; the
> > transformation that ivopts do does not prevent that or make it any more
> > difficult.
> 
> I'm not sure that blindly doing transformations in the hope that subsequent
> ones will repair the potential damages is always the way to go.  Why not
> avoid causing the damages in some cases?  Is that really too "old-style"?

Ivopts cannot create decrease-and-branch instruction by itself, as it is
run on trees.  So we have to rely on doloop to do so.  For doloop, the
exact shape of the exit test does not matter, so whether ivopts will do
the transformation or not is completely irrelevant.

So, while altering the cost function of ivopts to take decrease-and-branch
patterns into account and preserving the decreasing induction variable
would not be too hard (although it is a bit dubious whether it would
also be wise), I do not see how that would help.

Comment 23 Zdenek Dvorak 2007-11-24 21:56:11 UTC

Let me have a look what's going on here.

Comment 24 Zdenek Dvorak 2007-11-24 22:28:46 UTC

(In reply to comment #20)
> > Couldn't ivopts be taught to recognize "dec and branch on zero" patterns
> > 
> >   k_114 = k_15 + -1;
> >   if (k_114 != 0)
> >     goto <bb 10>;
> >   else
> >     goto <bb 11>;
> > 
> > and take into account their breakage for its cost estimates somehow?
> 
> Why?  If using dec and branch is profitable, doloop pass should do it; the
> transformation that ivopts do does not prevent that or make it any more
> difficult.

sorry, I misunderstood what you are asking for.  Yes, ivopts would have to solve this; however, they would not only have to know about the possibility to compare with zero and jump in one instruction, but also about autoincrement addressing modes, to get this one correctly.  That looks like a bit complicated way to solve the issue, I need to think about it more.

Comment 25 Zdenek Dvorak 2007-11-26 00:48:35 UTC

Created attachment 14637 [details]
Patch to make ivopts take autoincrement addressing modes into account

Ivopts take autoincrement addressing modes into account with this patch (in a fairly simplistic form, sufficient to improve the code for the testcases in this PR).  However, I do not have access to any architecture where I could test it (I somewhat suspect it might cause performance regressions -- using autoincrement addressing modes is not necessarily always profitable).

Comment 26 Mark Mitchell 2007-11-26 00:53:05 UTC

Zdenek, do you have the ability to get code-size measurements on ARM?  (You don't need to actually run the code to find out if this improves code density.)  If you don't, I'll ask someone at CodeSourcery to measure that.

Thanks,

-- Mark

Comment 27 rakdver@kam.mff.cuni.cz 2007-11-26 01:12:42 UTC

Subject: Re:  [4.3 Regression] Code size regression caused by fix to PR 31360

> Zdenek, do you have the ability to get code-size measurements on ARM?  (You
> don't need to actually run the code to find out if this improves code density.)
>  If you don't, I'll ask someone at CodeSourcery to measure that.

I can check the code size; however, I am somewhat concerned about
possible performance regressions on other architectures,
in particular, ia64, where the dependences created by introducing
autoincrements might conflict with scheduling.  I would appreciate
if someone could run some benchmark on ia64 with the patch.

Comment 28 Andrew Pinski 2007-11-26 01:24:46 UTC

A couple of comments about the patch.
+ #define CP_AUTOINC_OFFSET(CP) ((int) (size_t) (CP)->value)

I don't like this idea at all.

The patch should support pre increment also (this shows up on PPC) and pre/post decrement.

Comment 29 rakdver@kam.mff.cuni.cz 2007-11-26 01:42:55 UTC

Subject: Re:  [4.3 Regression] Code size regression caused by fix to PR 31360

> ------- Comment #28 from pinskia at gcc dot gnu dot org  2007-11-26 01:24 -------
> A couple of comments about the patch.
> + #define CP_AUTOINC_OFFSET(CP) ((int) (size_t) (CP)->value)
> 
> I don't like this idea at all.

I may change cost_pair to union, if this bothers you too much.

> The patch should support pre increment also (this shows up on PPC) and pre/post
> decrement.

post-decrement is already supported.  Pre-(inc/dec)rements are more
complicated, as supporting them would require us to create induction
variables incremented at the start of the loop body (instead of at the
end, as we do now).

Comment 30 Zdenek Dvorak 2007-11-26 05:08:47 UTC

The patch improves size of adler32 (and several other files in zlib) by about 2%.  However, overall on the whole csibe it is neutral (the sum of the sizes of all the files increases by 0.02%) -- the changes in size of the files range from 7.5% improvement to 7% regression.  I did not investigate the cause of the regressions yet.

Comment 31 Steven Bosscher 2007-11-26 09:05:40 UTC

I experimented with autoinc/-dec addressing modes for ARM earlier this year and also saw smaller code size reductions (just over 2% overall on CSiBE).  Looks like an area worth working on :-)

Apparently this bug has to do with code size, so let't make it block bug 16996.

Comment 32 Steven Bosscher 2008-01-27 15:35:57 UTC

I have re-tested Zdenek's patch on arm-unknown-elf.

128 files are smaller with the patch, and 126 files are larger.  The total size increase with the patch is 324 bytes on 3601910 bytes total size (or <0.01%) with r131735.  Neglecting all size increases, the total win with the patch is -2552 bytes, which is still less than one tenth of a percent.

The biggest win is for lib/zlib_inflate/inffast	from the package linux-2.4.23-pre3-testplatform.  The size decrease is 112 bytes with the patch, or -9.84%.

The biggest loser is jidctred from jpeg-6b with 100 bytes for +7.08%, but in bytes the loser is src/nrrd/axis from teem-1.6.0-src with 184 bytes or 4.05%.

It would be interesting to investigate the inffast improvement.  But the overall gain or loss with this patch makes it seem this isn't worth perusing too much further.

Comment 33 Alexandre Pereira Nunes 2008-02-12 00:32:33 UTC

I compiled gcc 4.3 for arm-unknown-elf (today's trunk, not sure about the rev). Compiling three in three firmware images gave me size regressions with -Os; with -O2, gcc 4.3 produces smaller code than 4.2.3:

# Increased about 3.1%
#nam gcc v  fl code size
img1 4.2.3 -Os 4786
img1 4.3.- -Os 4936


# Increased about 1.3%
img2 4.2.3 -Os 3372
img2 4.3.- -Os 3416


# Decreased (!) about 3,3%
img3 4.2.3 -O2  13892
img3 4.3.- -O2  13436

# Increased about 4,4%
img3 4.2.3 -Os  12348
img3 4.3.0 -Os  12892

Comment 34 Richard Biener 2008-03-14 16:47:23 UTC

Adjusting target milestone.

Comment 35 Hans-Peter Nilsson 2008-05-23 23:36:26 UTC

Looks like this is the new-generation/df equivalent of PR20211!

Comment 36 Richard Biener 2008-06-06 14:56:53 UTC

4.3.1 is being released, adjusting target milestone.

Comment 37 Andrew Pinski 2008-06-13 14:34:11 UTC

*** Bug 36135 has been marked as a duplicate of this bug. ***

Comment 38 Joel Sherrill 2008-07-15 19:33:00 UTC

Created attachment 15914 [details]
Zoltan's test case

Comment 39 Joel Sherrill 2008-07-15 19:36:41 UTC

I compiled Zoltan's example test.c for the various arm-rtems gcc versions I had laying around.  The GCC version string is modified to reflect the RTEMS specific patch revision and newlib version used in the binary.  I used -Os, -O1, and -O3.  Here is the report.  

arm-rtems-gcc (GCC) 3.2.3 (OAR Corporation gcc-3.2.3-20040420/newlib-1.11.0-20030605-4)
===> -Os 1020 0 0 1020 3fc test.o
===> -O1 1024 0 0 1024 400 test.o
===> -O2 1024 0 0 1024 400 test.o
arm-rtems4.7-gcc (GCC) 4.1.1 (RTEMS gcc-4.1.1/newlib-1.15.0-12.fc8)
===> -Os 936 0 0 936 3a8 test.o
===> -O1 840 0 0 840 348 test.o
===> -O2 892 0 0 892 37c test.o
arm-rtems4.8-gcc (GCC) 4.2.4 (RTEMS gcc-4.2.4/newlib-1.15.0-30.fc8)
===> -Os 928 0 0 928 3a0 test.o
===> -O1 852 0 0 852 354 test.o
===> -O2 908 0 0 908 38c test.o
arm-rtems4.9-gcc (GCC) 4.3.1
===> -Os 1740 0 0 1740 6cc test.o
===> -O1 868 0 0 868 364 test.o
===> -O2 1944 0 0 1944 798 test.o

Comment 40 Joseph S. Myers 2008-08-27 22:01:37 UTC

4.3.2 is released, changing milestones to 4.3.3.

Comment 41 Jorn Wolfgang Rennecke 2008-10-22 13:16:30 UTC

(In reply to comment #0)
 
> 1) Hoists a register containing 0 out of the loop

Does the TARGET_RTX_COSTS set the cost to zero for the constant zero?

Comment 42 Jorn Wolfgang Rennecke 2008-12-10 04:29:46 UTC

(In reply to comment #25)
> Created an attachment (id=14637) [edit]
> Patch to make ivopts take autoincrement addressing modes into account
> 
> Ivopts take autoincrement addressing modes into account with this patch (in a
> fairly simplistic form, sufficient to improve the code for the testcases in
> this PR).  However, I do not have access to any architecture where I could test
> it (I somewhat suspect it might cause performance regressions -- using
> autoincrement addressing modes is not necessarily always profitable).
> 

AFAICS this does not take any AUTO_INC modes but POST_INC into account.
When you have PRE_MODIFY and POST_MODIFY, most or all iv increments become
free if there is a memory access using them.

Also, if you are not sure yet of the merit of the patch, it would be best to make its effect an option, so that it can be easily benchmarked.
And if it turns out to be good for some targets but bad for others, we can keep
it an option which gets defaulted appropriately by OPTIMIZATION_OPTIONS.

Comment 43 Jorn Wolfgang Rennecke 2008-12-10 05:29:08 UTC

(In reply to comment #25)
> Created an attachment (id=14637) [edit]
> Patch to make ivopts take autoincrement addressing modes into account
> 
> Ivopts take autoincrement addressing modes into account with this patch (in a
> fairly simplistic form, sufficient to improve the code for the testcases in
> this PR).  However, I do not have access to any architecture where I could test
> it (I somewhat suspect it might cause performance regressions -- using
> autoincrement addressing modes is not necessarily always profitable).
> 

You patch suffered from some bitrot.  After fixing the merge issues, I can
confirm that it fixes the bcmp regression problem for ARC (see PR38440).

Comment 44 stevenb.gcc@gmail.com 2008-12-10 22:30:33 UTC

Subject: Re:  [4.3/4.4 Regression] Code size increased with PR 31360 (IV-opts not understanding autoincrement)

Joern, can you attach the updated patch?

Comment 45 Jorn Wolfgang Rennecke 2008-12-11 02:07:47 UTC

(In reply to comment #44)
> Subject: Re:  [4.3/4.4 Regression] Code size increased with PR 31360 (IV-opts
> not understanding autoincrement)
> 
> Joern, can you attach the updated patch?

I still wait for confirmation from the FSF that our Copyright assignment has
been filed.

In the meantime, I'll send you the patch by personal email.

Comment 46 Richard Biener 2009-01-24 10:19:34 UTC

GCC 4.3.3 is being released, adjusting target milestone.

Comment 47 Alexandre Pereira Nunes 2009-03-06 15:29:55 UTC

Any news on the subject?

Comment 48 Jorn Wolfgang Rennecke 2009-03-06 15:54:49 UTC

Created attachment 17408 [details]
patch to take POST_DEC and POST_MODIFY into account

The Copyright assignment issue has been resulved now.

This is the patch I've sent to Steven Bosscher in December.
It would be good if we had some benchmarks on different architectures
to check the merit of having the f{,no-}ivopts-post-inc and
f{,no-}ivopts-post-modify options.

Comment 49 Richard Biener 2009-08-04 12:28:05 UTC

GCC 4.3.4 is being released, adjusting target milestone.

Comment 50 Steven Bosscher 2010-02-04 16:04:00 UTC

Bernd Schmidt has worked on this, see here:

http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01788.html
http://gcc.gnu.org/ml/gcc-cvs/2009-08/msg00268.html

It is hard to tell whether this has actually addresses the issues raised in this bug report, because it is unclear what code the OP is expecting from the compiler.

Comment 51 Steven Bosscher 2010-02-10 10:42:34 UTC

Could the OP be so kind to see if this is still a problem? And, if this is still a problem with an unpatched compiler: whether the problem goes away if arm_arm_address_cost() returns 1 unconditionally (so that this bug can be re-qualified as a target problem rather than a tree-optimization issue)?

Comment 52 Richard Biener 2010-05-22 18:11:26 UTC

GCC 4.3.5 is being released, adjusting target milestone.

Comment 53 Steven Bosscher 2010-07-20 22:12:26 UTC

Could the OP be so kind to see if this is still a problem?

Comment 54 Steven Bosscher 2011-02-20 15:14:14 UTC

No response for more than a year.