Bug 9663 - [arm] gcc-20030127 misses an optimization opportunity
Summary: [arm] gcc-20030127 misses an optimization opportunity
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.3
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization, patch
Depends on:
Blocks:
 
Reported: 2003-02-11 17:06 UTC by gertom
Modified: 2019-06-06 15:15 UTC (History)
5 users (show)

See Also:
Host:
Target: arm-elf
Build:
Known to work:
Known to fail:
Last reconfirmed: 2004-11-06 17:43:44


Attachments
patch for extending move and compare parallelization (630 bytes, application/x-compressed)
2003-06-03 13:38 UTC, Gábor Lóki
Details
Testcase from CSiBE teem sources (10.07 KB, application/zip)
2019-06-06 14:50 UTC, Fredrik Hederstierna
Details
range ran through preprocessor using -E -P (7.58 KB, text/plain)
2019-06-06 15:15 UTC, Fredrik Hederstierna
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gertom 2003-02-11 17:06:00 UTC
The following two lines:

cmp r0, #0
mov lr, r0

should be replaced with:

subs lr, r0, #0

in the asm output.

20020503-1.i:
---
# 1 "20020503-1.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "20020503-1.c"



 
void abort (void);
static char *
inttostr (long i, char buf[128])
{
  unsigned long ui = i;
  char *p = buf + 127;
  *p = '\0';
  if (i < 0)
    ui = -ui;
  do
    *--p = '0' + ui % 10;
  while ((ui /= 10) != 0);
  if (i < 0)
    *--p = '-';
  return p;
}

int
main ()
{
  char buf[128], *p;

  p = inttostr (-1, buf);
  if (*p != '-')
    abort ();
  return 0;
}

Release:
gcc version 3.3 20030127 (prerelease)

Environment:
Built on: Linux 2.4.20 i686 unknown
Configured with: /home/gertom/gcc/src/gcc-20030127/configure --target=arm-elf --prefix=/home/gertom/gcc/build/install-20030127-arm-elf-orgn --enable-target-optspace --with-newlib --with-headers --disable-nls --disable-threads --disable-shared --disable-libgcj --disable-multilib --with-gnu-as --with-gnu-ld --enable-languages=c,c++
Thread model: single

How-To-Repeat:
just compile with -O2
Comment 1 gertom 2003-02-11 17:06:00 UTC
Fix:
See "PATCH: new peephole2 in arm.md" at http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00204.html
Comment 2 Dara Hazeghi 2003-05-26 20:55:46 UTC
Hello,

this problem is still present on gcc 3.3 branch and mainline (20030509). Could you resend the 
patch to gcc-patches, and cc: one of the arm-maintainers? Also, can you attach the patch to this 
PR, so it doesn't get lost? Thanks,

Dara
Comment 3 Andrew Pinski 2003-05-26 20:57:05 UTC
See Dara's question.
Comment 4 Dara Hazeghi 2003-06-02 05:30:23 UTC
Shouldn't be in waiting. Confirmed.
Comment 5 Gábor Lóki 2003-06-03 13:38:24 UTC
Created attachment 4155 [details]
patch for extending move and compare parallelization

>Also, can you attach the patch to this PR, so it doesn't get lost?
I'am done with it.

Regards,
  Gabor Loki
Comment 6 Dara Hazeghi 2003-12-10 00:12:19 UTC
Confirmed still not present on mainline. Richard, would you mind looking at the patch included 
here? Thanks.
Comment 7 Andrew Pinski 2004-05-21 03:54:16 UTC
Note you have to do with -fno-inline now on the mainline as the function is inlined at -O2.
Comment 8 Ramana Radhakrishnan 2009-02-08 05:17:30 UTC
(In reply to comment #7)
> Note you have to do with -fno-inline now on the mainline as the function is
> inlined at -O2.
> 

It looks as though this is fixed in 4.3 and mainline today. I checked with 4.1 and saw that the problem existed in 4.1 


Looking at the assembly generated for the function, I no longer see a cmp and mov as reported in the bug report. I see similar code generated in 4.1 but no longer in 4.3 or 4.4. I see subs generated for 4.3 and 4.4 in the loop kernel as of version r143940 for 4.3 and 144002 for mainline. 

Here is the snippet of code from 4.1, 4.3 and 4.4 as given below.


4.1 

.L8:
        ldr     r3, .L12
        umull   r1, r2, r3, ip
        mov     r2, r2, lsr #3
        mov     r3, r2, asl #1
        mov     r1, r2, asl #3
        add     r3, r3, r1
        rsb     r3, r3, ip
        add     r3, r3, #48
        cmp     r2, #0            ---- Insns from original bug report.
        mov     ip, r2            ----
        strb    r3, [r0, #-1]!
        bne     .L8



4.3
.L5:
	umull	r2, r3, r5, ip
	mov	r3, r3, lsr #3
	mov	r2, r3, asl #1
	mov	r1, r3, asl #3
	add	r2, r2, r1
	rsb	r2, r2, ip
	add	r2, r2, #48
	subs	ip, r3, #0
	strb	r2, [r0, #-1]!
	bne	.L5





4.4 

  .L5:
	umull	r1, r2, r4, ip
	mov	r2, r2, lsr #3
	add	r1, r2, r2, asl #2
	sub	ip, ip, r1, asl #1
	add	r1, ip, #48
	subs	ip, r2, #0
	strb	r1, [r0, #-1]!
	bne	.L5







Comment 9 Richard Earnshaw 2009-03-13 09:38:15 UTC
Now believed fixed.
Comment 10 Fredrik Hederstierna 2019-06-06 14:50:07 UTC
Created attachment 46457 [details]
Testcase from CSiBE teem sources

Testcase from CSiBE teem sources
Tested with gcc-9.1.0 for ARM 32bit targets.

Without peephole2

00000000 <nrrdRangeSet>:
   0:	e92d407f 	push	{r0, r1, r2, r3, r4, r5, r6, lr}
   4:	e2504000 	subs	r4, r0, #0
   8:	0a00003f 	beq	10c <nrrdRangeSet+0x10c>
   c:	e3510000 	cmp	r1, #0
  10:	e1a05001 	mov	r5, r1

With peephole2

00000000 <nrrdRangeSet>:
   0:	e92d407f 	push	{r0, r1, r2, r3, r4, r5, r6, lr}
   4:	e2504000 	subs	r4, r0, #0
   8:	0a00003e 	beq	108 <nrrdRangeSet+0x108>
   c:	e2515000 	subs	r5, r1, #0

/Fredrik
Comment 11 Richard Earnshaw 2019-06-06 15:00:26 UTC
(In reply to Fredrik Hederstierna from comment #10)
> Created attachment 46457 [details]
> Testcase from CSiBE teem sources
> 
> Testcase from CSiBE teem sources
> Tested with gcc-9.1.0 for ARM 32bit targets.
> 
> Without peephole2
> 
> 00000000 <nrrdRangeSet>:
>    0:	e92d407f 	push	{r0, r1, r2, r3, r4, r5, r6, lr}
>    4:	e2504000 	subs	r4, r0, #0
>    8:	0a00003f 	beq	10c <nrrdRangeSet+0x10c>
>    c:	e3510000 	cmp	r1, #0
>   10:	e1a05001 	mov	r5, r1
> 
> With peephole2
> 
> 00000000 <nrrdRangeSet>:
>    0:	e92d407f 	push	{r0, r1, r2, r3, r4, r5, r6, lr}
>    4:	e2504000 	subs	r4, r0, #0
>    8:	0a00003e 	beq	108 <nrrdRangeSet+0x108>
>    c:	e2515000 	subs	r5, r1, #0
> 
> /Fredrik

Can you run this through your preprocessor to remove the dependencies on external headers?
Comment 12 Fredrik Hederstierna 2019-06-06 15:15:15 UTC
Created attachment 46458 [details]
range ran through preprocessor using -E -P

I'm not sure if this is what you wanted, but here is file stand-alone compile-able, without headers. Compiled with -E -P.
/Fredrik