Bug 46393

Summary: [11/12/13/14 Regression] m68k code size regression
Product: gcc Reporter: Anders Montonen <Anders.Montonen>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: jeffreyalaw
Priority: P4    
Version: 4.5.1   
Target Milestone: 11.5   
Host: Target: m68k-elf
Build: Known to work:
Known to fail: Last reconfirmed: 2015-01-17 00:00:00

Description Anders Montonen 2010-11-09 13:56:05 UTC
The following code snippet compiles to 112 bytes using GCC 4.4.0 and 154 bytes using GCC 4.5.1.

Host is OS X 10.6, target is m68k-elf bare-metal toolchain using newlib.
The code was compiled with -m68000 -Os -c.

unpack.c:
#include <stdint.h>

uint8_t ChannelMap[32];

typedef struct
{
    uint8_t note;
    uint8_t instrument;
    uint8_t volume;
    uint8_t command;
    uint8_t info;
} Channeldata_t;

Channeldata_t RowBuffer[16];

uint8_t UnpackRow(const uint8_t *pRowData)
{
    uint8_t count = 0, byte, channel, mappedchannel;
    Channeldata_t *pChannelData;
    
    while ((byte = *pRowData++) != 0)
    {
        ++count;
        channel = byte & 0x1f;
        mappedchannel = ChannelMap[channel];
        pChannelData = &RowBuffer[mappedchannel];
        
        if (byte & 0x20)
        {
            pChannelData->note = *pRowData++;
            pChannelData->instrument = *pRowData++;
            ++count;
        }
        if (byte & 0x40)
        {
            pChannelData->volume = *pRowData++;
            ++count;
        }
        if (byte & 0x80)
        {
            pChannelData->command = *pRowData++;
            pChannelData->info = *pRowData++;
            count += 2;
        }
    }
    
    return count;
}

GCC 4.4.0-produced code:

00000000 <UnpackRow>:
   0:	4e56 0000      	linkw %fp,#0
   4:	48e7 3820      	moveml %d2-%d4/%a2,%sp@-
   8:	206e 0008      	moveal %fp@(8),%a0
   c:	4200           	clrb %d0
   e:	45f9 0000 0000 	lea 0 <UnpackRow>,%a2
  14:	604e           	bras 64 <UnpackRow+0x64>
  16:	5288           	addql #1,%a0
  18:	5200           	addqb #1,%d0
  1a:	7400           	moveq #0,%d2
  1c:	1401           	moveb %d1,%d2
  1e:	761f           	moveq #31,%d3
  20:	c682           	andl %d2,%d3
  22:	1632 3800      	moveb %a2@(0000000000000000,%d3:l),%d3
  26:	0283 0000 00ff 	andil #255,%d3
  2c:	2803           	movel %d3,%d4
  2e:	d883           	addl %d3,%d4
  30:	d884           	addl %d4,%d4
  32:	2244           	moveal %d4,%a1
  34:	d3c3           	addal %d3,%a1
  36:	d3fc 0000 0000 	addal #0,%a1
  3c:	0802 0005      	btst #5,%d2
  40:	6708           	beqs 4a <UnpackRow+0x4a>
  42:	1298           	moveb %a0@+,%a1@
  44:	1358 0001      	moveb %a0@+,%a1@(1)
  48:	5200           	addqb #1,%d0
  4a:	0802 0006      	btst #6,%d2
  4e:	6706           	beqs 56 <UnpackRow+0x56>
  50:	1358 0002      	moveb %a0@+,%a1@(2)
  54:	5200           	addqb #1,%d0
  56:	4a01           	tstb %d1
  58:	6c0a           	bges 64 <UnpackRow+0x64>
  5a:	1358 0003      	moveb %a0@+,%a1@(3)
  5e:	1358 0004      	moveb %a0@+,%a1@(4)
  62:	5400           	addqb #2,%d0
  64:	1210           	moveb %a0@,%d1
  66:	66ae           	bnes 16 <UnpackRow+0x16>
  68:	4cdf 041c      	moveml %sp@+,%d2-%d4/%a2
  6c:	4e5e           	unlk %fp
  6e:	4e75           	rts

GCC 4.5.1-produced code:

00000000 <UnpackRow>:
   0:	4e56 0000      	linkw %fp,#0
   4:	48e7 383c      	moveml %d2-%d4/%a2-%a5,%sp@-
   8:	206e 0008      	moveal %fp@(8),%a0
   c:	4200           	clrb %d0
   e:	49f9 0000 0000 	lea 0 <UnpackRow>,%a4
  14:	47f9 0000 0000 	lea 0 <UnpackRow>,%a3
  1a:	45f9 0000 0000 	lea 0 <UnpackRow>,%a2
  20:	606c           	bras 8e <UnpackRow+0x8e>
  22:	5288           	addql #1,%a0
  24:	5200           	addqb #1,%d0
  26:	721f           	moveq #31,%d1
  28:	c282           	andl %d2,%d1
  2a:	1234 1800      	moveb %a4@(0000000000000000,%d1:l),%d1
  2e:	0281 0000 00ff 	andil #255,%d1
  34:	7600           	moveq #0,%d3
  36:	1602           	moveb %d2,%d3
  38:	0803 0005      	btst #5,%d3
  3c:	6714           	beqs 52 <UnpackRow+0x52>
  3e:	2801           	movel %d1,%d4
  40:	d881           	addl %d1,%d4
  42:	d884           	addl %d4,%d4
  44:	2244           	moveal %d4,%a1
  46:	d3c1           	addal %d1,%a1
  48:	1398 b800      	moveb %a0@+,%a1@(0000000000000000,%a3:l)
  4c:	1398 a800      	moveb %a0@+,%a1@(0000000000000000,%a2:l)
  50:	5200           	addqb #1,%d0
  52:	0803 0006      	btst #6,%d3
  56:	6714           	beqs 6c <UnpackRow+0x6c>
  58:	2601           	movel %d1,%d3
  5a:	d681           	addl %d1,%d3
  5c:	d683           	addl %d3,%d3
  5e:	2243           	moveal %d3,%a1
  60:	d3c1           	addal %d1,%a1
  62:	d3fc 0000 0000 	addal #0,%a1
  68:	1298           	moveb %a0@+,%a1@
  6a:	5200           	addqb #1,%d0
  6c:	4a02           	tstb %d2
  6e:	6c1e           	bges 8e <UnpackRow+0x8e>
  70:	2401           	movel %d1,%d2
  72:	d481           	addl %d1,%d2
  74:	d482           	addl %d2,%d2
  76:	2242           	moveal %d2,%a1
  78:	d3c1           	addal %d1,%a1
  7a:	2a49           	moveal %a1,%a5
  7c:	dbfc 0000 0000 	addal #0,%a5
  82:	1a98           	moveb %a0@+,%a5@
  84:	d3fc 0000 0000 	addal #0,%a1
  8a:	1298           	moveb %a0@+,%a1@
  8c:	5400           	addqb #2,%d0
  8e:	1410           	moveb %a0@,%d2
  90:	6690           	bnes 22 <UnpackRow+0x22>
  92:	4cdf 3c1c      	moveml %sp@+,%d2-%d4/%a2-%a5
  96:	4e5e           	unlk %fp
  98:	4e75           	rts
Comment 1 Jeffrey A. Law 2016-01-20 17:49:18 UTC
It appears the problem starts with forwprop turning the pointer accesses into array/structure memory accesses.  This is generally a good thing.

However, in this instance it makes it awful hard to recover the CSE opportunities  that are needed to get good compact code. 

We have 3 instances of:

  30 003e D3C2                  add.l %d2,%a1
  31 0040 D3C9                  add.l %a1,%a1
  32 0042 D3C2                  add.l %d2,%a1
 
That's 12 wasted bytes.

ANd we two have two instances of:

  24 0032 5200                  addq.b #1,%d0
  25 0034 5288                  addq.l #1,%a0


Another two wasted bytes.

Also related we end up selecting poor addressing modes which probably another 10-16 bytes.

But at the core AFAICT is recovery of array/structure access from what was pointer accesses.   In theory PRE ought to come along and pull out the redundant address arithmetic, but it doesn't (not even with -O2).

It's not clear how prelevant this is across other architectures, so I'm keeping a P4 for now.  If someone can show this causing problems on non-dead targets, then we might consider bumping this up to a P2 priority.
Comment 2 Richard Biener 2016-08-03 10:48:58 UTC
GCC 4.9 branch is being closed
Comment 3 Jakub Jelinek 2018-10-26 10:25:40 UTC
GCC 6 branch is being closed
Comment 4 Richard Biener 2019-11-14 07:53:51 UTC
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
Comment 5 Jakub Jelinek 2020-03-04 09:47:29 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 6 Jakub Jelinek 2021-05-14 09:46:15 UTC
GCC 8 branch is being closed.
Comment 7 Richard Biener 2021-06-01 08:05:01 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 8 Richard Biener 2022-05-27 09:34:14 UTC
GCC 9 branch is being closed
Comment 9 Jakub Jelinek 2022-06-28 10:29:51 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 10 Richard Biener 2023-07-07 10:29:20 UTC
GCC 10 branch is being closed.