Bug 46393 - [7/8/9/10 Regression] m68k code size regression
Summary: [7/8/9/10 Regression] m68k code size regression
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.1
: P4 normal
Target Milestone: 7.5
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-09 13:56 UTC by Anders Montonen
Modified: 2018-12-06 10:09 UTC (History)
1 user (show)

See Also:
Host:
Target: m68k-elf
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-01-17 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Anders Montonen 2010-11-09 13:56:05 UTC
The following code snippet compiles to 112 bytes using GCC 4.4.0 and 154 bytes using GCC 4.5.1.

Host is OS X 10.6, target is m68k-elf bare-metal toolchain using newlib.
The code was compiled with -m68000 -Os -c.

unpack.c:
#include <stdint.h>

uint8_t ChannelMap[32];

typedef struct
{
    uint8_t note;
    uint8_t instrument;
    uint8_t volume;
    uint8_t command;
    uint8_t info;
} Channeldata_t;

Channeldata_t RowBuffer[16];

uint8_t UnpackRow(const uint8_t *pRowData)
{
    uint8_t count = 0, byte, channel, mappedchannel;
    Channeldata_t *pChannelData;
    
    while ((byte = *pRowData++) != 0)
    {
        ++count;
        channel = byte & 0x1f;
        mappedchannel = ChannelMap[channel];
        pChannelData = &RowBuffer[mappedchannel];
        
        if (byte & 0x20)
        {
            pChannelData->note = *pRowData++;
            pChannelData->instrument = *pRowData++;
            ++count;
        }
        if (byte & 0x40)
        {
            pChannelData->volume = *pRowData++;
            ++count;
        }
        if (byte & 0x80)
        {
            pChannelData->command = *pRowData++;
            pChannelData->info = *pRowData++;
            count += 2;
        }
    }
    
    return count;
}

GCC 4.4.0-produced code:

00000000 <UnpackRow>:
   0:	4e56 0000      	linkw %fp,#0
   4:	48e7 3820      	moveml %d2-%d4/%a2,%sp@-
   8:	206e 0008      	moveal %fp@(8),%a0
   c:	4200           	clrb %d0
   e:	45f9 0000 0000 	lea 0 <UnpackRow>,%a2
  14:	604e           	bras 64 <UnpackRow+0x64>
  16:	5288           	addql #1,%a0
  18:	5200           	addqb #1,%d0
  1a:	7400           	moveq #0,%d2
  1c:	1401           	moveb %d1,%d2
  1e:	761f           	moveq #31,%d3
  20:	c682           	andl %d2,%d3
  22:	1632 3800      	moveb %a2@(0000000000000000,%d3:l),%d3
  26:	0283 0000 00ff 	andil #255,%d3
  2c:	2803           	movel %d3,%d4
  2e:	d883           	addl %d3,%d4
  30:	d884           	addl %d4,%d4
  32:	2244           	moveal %d4,%a1
  34:	d3c3           	addal %d3,%a1
  36:	d3fc 0000 0000 	addal #0,%a1
  3c:	0802 0005      	btst #5,%d2
  40:	6708           	beqs 4a <UnpackRow+0x4a>
  42:	1298           	moveb %a0@+,%a1@
  44:	1358 0001      	moveb %a0@+,%a1@(1)
  48:	5200           	addqb #1,%d0
  4a:	0802 0006      	btst #6,%d2
  4e:	6706           	beqs 56 <UnpackRow+0x56>
  50:	1358 0002      	moveb %a0@+,%a1@(2)
  54:	5200           	addqb #1,%d0
  56:	4a01           	tstb %d1
  58:	6c0a           	bges 64 <UnpackRow+0x64>
  5a:	1358 0003      	moveb %a0@+,%a1@(3)
  5e:	1358 0004      	moveb %a0@+,%a1@(4)
  62:	5400           	addqb #2,%d0
  64:	1210           	moveb %a0@,%d1
  66:	66ae           	bnes 16 <UnpackRow+0x16>
  68:	4cdf 041c      	moveml %sp@+,%d2-%d4/%a2
  6c:	4e5e           	unlk %fp
  6e:	4e75           	rts

GCC 4.5.1-produced code:

00000000 <UnpackRow>:
   0:	4e56 0000      	linkw %fp,#0
   4:	48e7 383c      	moveml %d2-%d4/%a2-%a5,%sp@-
   8:	206e 0008      	moveal %fp@(8),%a0
   c:	4200           	clrb %d0
   e:	49f9 0000 0000 	lea 0 <UnpackRow>,%a4
  14:	47f9 0000 0000 	lea 0 <UnpackRow>,%a3
  1a:	45f9 0000 0000 	lea 0 <UnpackRow>,%a2
  20:	606c           	bras 8e <UnpackRow+0x8e>
  22:	5288           	addql #1,%a0
  24:	5200           	addqb #1,%d0
  26:	721f           	moveq #31,%d1
  28:	c282           	andl %d2,%d1
  2a:	1234 1800      	moveb %a4@(0000000000000000,%d1:l),%d1
  2e:	0281 0000 00ff 	andil #255,%d1
  34:	7600           	moveq #0,%d3
  36:	1602           	moveb %d2,%d3
  38:	0803 0005      	btst #5,%d3
  3c:	6714           	beqs 52 <UnpackRow+0x52>
  3e:	2801           	movel %d1,%d4
  40:	d881           	addl %d1,%d4
  42:	d884           	addl %d4,%d4
  44:	2244           	moveal %d4,%a1
  46:	d3c1           	addal %d1,%a1
  48:	1398 b800      	moveb %a0@+,%a1@(0000000000000000,%a3:l)
  4c:	1398 a800      	moveb %a0@+,%a1@(0000000000000000,%a2:l)
  50:	5200           	addqb #1,%d0
  52:	0803 0006      	btst #6,%d3
  56:	6714           	beqs 6c <UnpackRow+0x6c>
  58:	2601           	movel %d1,%d3
  5a:	d681           	addl %d1,%d3
  5c:	d683           	addl %d3,%d3
  5e:	2243           	moveal %d3,%a1
  60:	d3c1           	addal %d1,%a1
  62:	d3fc 0000 0000 	addal #0,%a1
  68:	1298           	moveb %a0@+,%a1@
  6a:	5200           	addqb #1,%d0
  6c:	4a02           	tstb %d2
  6e:	6c1e           	bges 8e <UnpackRow+0x8e>
  70:	2401           	movel %d1,%d2
  72:	d481           	addl %d1,%d2
  74:	d482           	addl %d2,%d2
  76:	2242           	moveal %d2,%a1
  78:	d3c1           	addal %d1,%a1
  7a:	2a49           	moveal %a1,%a5
  7c:	dbfc 0000 0000 	addal #0,%a5
  82:	1a98           	moveb %a0@+,%a5@
  84:	d3fc 0000 0000 	addal #0,%a1
  8a:	1298           	moveb %a0@+,%a1@
  8c:	5400           	addqb #2,%d0
  8e:	1410           	moveb %a0@,%d2
  90:	6690           	bnes 22 <UnpackRow+0x22>
  92:	4cdf 3c1c      	moveml %sp@+,%d2-%d4/%a2-%a5
  96:	4e5e           	unlk %fp
  98:	4e75           	rts
Comment 1 Jeffrey A. Law 2016-01-20 17:49:18 UTC
It appears the problem starts with forwprop turning the pointer accesses into array/structure memory accesses.  This is generally a good thing.

However, in this instance it makes it awful hard to recover the CSE opportunities  that are needed to get good compact code. 

We have 3 instances of:

  30 003e D3C2                  add.l %d2,%a1
  31 0040 D3C9                  add.l %a1,%a1
  32 0042 D3C2                  add.l %d2,%a1
 
That's 12 wasted bytes.

ANd we two have two instances of:

  24 0032 5200                  addq.b #1,%d0
  25 0034 5288                  addq.l #1,%a0


Another two wasted bytes.

Also related we end up selecting poor addressing modes which probably another 10-16 bytes.

But at the core AFAICT is recovery of array/structure access from what was pointer accesses.   In theory PRE ought to come along and pull out the redundant address arithmetic, but it doesn't (not even with -O2).

It's not clear how prelevant this is across other architectures, so I'm keeping a P4 for now.  If someone can show this causing problems on non-dead targets, then we might consider bumping this up to a P2 priority.
Comment 2 Richard Biener 2016-08-03 10:48:58 UTC
GCC 4.9 branch is being closed
Comment 3 Jakub Jelinek 2018-10-26 10:25:40 UTC
GCC 6 branch is being closed