Bug 23539 - C & C++ compiler generating misaligned references regardless of compiler flags
C & C++ compiler generating misaligned references regardless of compiler flags
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: target
4.0.1
: P2 normal
: 3.4.5
Assigned To: David Edelsohn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-08-23 21:21 UTC by Eric McVicker
Modified: 2005-08-31 15:07 UTC (History)
1 user (show)

See Also:
Host: i686-pc-linux-gnu
Target: powerpc-eabi
Build: i686-pc-linux-gnu
Known to work: 3.4.5 4.0.2
Known to fail: 3.4.4 4.0.1
Last reconfirmed: 2005-08-26 23:55:57


Attachments
Maintain alignment (1.28 KB, patch)
2005-08-26 23:56 UTC, David Edelsohn
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric McVicker 2005-08-23 21:21:48 UTC
This issue was uncovered in porting our existing software to the GNU tool-
chain.  We have a number of structures that contain 3 individual bytes of 
data.  When the GNU tool-chain compiles the source code, it creates a 
load/store byte instruction followed by a load/store half-word instruction with 
an odd (1,3,5,7,9,11,etc) memory offset.  This causes a data alignment 
exception to occur.

We have tried all combinations of the compiler flags for structure packing, 
alignment (natural, power), and anything else that we have been able to uncover 
in the GCC documentation.

This behavior exists at optimization levels, 0,1 and 2.  We haven't tried any 
other levels as of yet.

There should be a means of having the compiler override mis-aligned address 
references.  This is supported at a software level (if the authors of the OS or 
software handle this exception).  However, the authors of this component did 
not, and it would cause far too much of a runtime hit to implement this.  A 
code sample is included below that will re-create this problem.

----  File test.cc -----

struct foo {
   char bar1;
   char bar2;
   char bar3;
};

int foobarStruct(foo fubarStruct) {
   if(fubarStruct.bar1 == 'A' &&
      fubarStruct.bar2 == 'B' &&
      fubarStruct.bar3 == 'C')
   {
      return 1;
   }
   else {
      return 0;
   }
}

int main(int argc, char **argv) {

   int rVal1;
   int rVal2;

   foo barStruct;

   barStruct.bar1 = 'A';
   barStruct.bar2 = 'B';
   barStruct.bar3 = 'C';

   rVal1 = foobarStruct(barStruct);

   barStruct.bar1 = 'A';
   barStruct.bar2 = 'C';
   barStruct.bar3 = 'B';

   rVal1 = foobarStruct(barStruct);

   return (rVal1 || rVal2);
}

------------- End of file ---------------------

Again using any combinations of compiler flags -malign-natural, -malign-power, -
fpack-struct=2, -fno-pack-struct, etc have not given us the desired behavior.

Here's the assembly output from the command(s):


(1)> g++ test.cc -o test
(2)> ppcobjdump -C -S test.o

test.o:     file format elf32-powerpc

Disassembly of section .text:

00000000 <foobarStruct(foo)>:
   0:   94 21 ff e8     stwu    r1,-24(r1)
   4:   93 e1 00 14     stw     r31,20(r1)
   8:   7c 3f 0b 78     mr      r31,r1
   c:   90 7f 00 0c     stw     r3,12(r31)
  10:   81 3f 00 0c     lwz     r9,12(r31)
  14:   88 09 00 00     lbz     r0,0(r9)
  18:   54 00 06 3e     clrlwi  r0,r0,24
  1c:   2f 80 00 41     cmpwi   cr7,r0,65
  20:   40 9e 00 38     bne-    cr7,58 <foobarStruct(foo)+0x58>
  24:   81 3f 00 0c     lwz     r9,12(r31)
  28:   88 09 00 01     lbz     r0,1(r9)
  2c:   54 00 06 3e     clrlwi  r0,r0,24
  30:   2f 80 00 42     cmpwi   cr7,r0,66
  34:   40 9e 00 24     bne-    cr7,58 <foobarStruct(foo)+0x58>
  38:   81 3f 00 0c     lwz     r9,12(r31)
  3c:   88 09 00 02     lbz     r0,2(r9)
  40:   54 00 06 3e     clrlwi  r0,r0,24
  44:   2f 80 00 43     cmpwi   cr7,r0,67
  48:   40 9e 00 10     bne-    cr7,58 <foobarStruct(foo)+0x58>
  4c:   38 00 00 01     li      r0,1
  50:   90 1f 00 08     stw     r0,8(r31)
  54:   48 00 00 0c     b       60 <foobarStruct(foo)+0x60>
  58:   39 20 00 00     li      r9,0
  5c:   91 3f 00 08     stw     r9,8(r31)
  60:   80 1f 00 08     lwz     r0,8(r31)
  64:   7c 03 03 78     mr      r3,r0
  68:   81 61 00 00     lwz     r11,0(r1)
  6c:   83 eb ff fc     lwz     r31,-4(r11)
  70:   7d 61 5b 78     mr      r1,r11
  74:   4e 80 00 20     blr

00000078 <main>:
  78:   94 21 ff a8     stwu    r1,-88(r1)
  7c:   7c 08 02 a6     mflr    r0
  80:   93 e1 00 54     stw     r31,84(r1)
  84:   90 01 00 5c     stw     r0,92(r1)
  88:   7c 3f 0b 78     mr      r31,r1
  8c:   90 7f 00 28     stw     r3,40(r31)
  90:   90 9f 00 2c     stw     r4,44(r31)
  94:   48 00 00 01     bl      94 <main+0x1c>
  98:   38 00 00 41     li      r0,65
  9c:   98 1f 00 16     stb     r0,22(r31)
  a0:   38 00 00 42     li      r0,66
  a4:   98 1f 00 17     stb     r0,23(r31)
  a8:   38 00 00 43     li      r0,67
  ac:   98 1f 00 18     stb     r0,24(r31)
  b0:   88 1f 00 16     lbz     r0,22(r31)
  b4:   a1 3f 00 17     lhz     r9,23(r31)    <-- Notice the odd offset
  b8:   98 1f 00 13     stb     r0,19(r31)
  bc:   b1 3f 00 14     sth     r9,20(r31)
  c0:   88 1f 00 13     lbz     r0,19(r31)
  c4:   a1 3f 00 14     lhz     r9,20(r31)
  c8:   98 1f 00 30     stb     r0,48(r31)
  cc:   b1 3f 00 31     sth     r9,49(r31)    <-- Notice the off offset
  d0:   38 1f 00 30     addi    r0,r31,48
  d4:   7c 03 03 78     mr      r3,r0
  d8:   48 00 00 01     bl      d8 <main+0x60>
  dc:   7c 60 1b 78     mr      r0,r3
  e0:   90 1f 00 0c     stw     r0,12(r31)
  e4:   38 00 00 41     li      r0,65
  e8:   98 1f 00 16     stb     r0,22(r31)
  ec:   38 00 00 43     li      r0,67
  f0:   98 1f 00 17     stb     r0,23(r31)
  f4:   38 00 00 42     li      r0,66
  f8:   98 1f 00 18     stb     r0,24(r31)
  fc:   88 1f 00 16     lbz     r0,22(r31)
 100:   a1 3f 00 17     lhz     r9,23(r31)    <-- Again odd offsets
 104:   98 1f 00 10     stb     r0,16(r31)
 108:   b1 3f 00 11     sth     r9,17(r31)    <-- Again odd offsets
 10c:   88 1f 00 10     lbz     r0,16(r31)
 110:   a1 3f 00 11     lhz     r9,17(r31)
 114:   98 1f 00 30     stb     r0,48(r31)
 118:   b1 3f 00 31     sth     r9,49(r31)
 11c:   38 1f 00 30     addi    r0,r31,48
 120:   7c 03 03 78     mr      r3,r0
 124:   48 00 00 01     bl      124 <main+0xac>
 128:   7c 60 1b 78     mr      r0,r3
 12c:   90 1f 00 0c     stw     r0,12(r31)
 130:   80 1f 00 0c     lwz     r0,12(r31)
 134:   2f 80 00 00     cmpwi   cr7,r0,0
 138:   40 9e 00 10     bne-    cr7,148 <main+0xd0>
 13c:   80 1f 00 08     lwz     r0,8(r31)
 140:   2f 80 00 00     cmpwi   cr7,r0,0
 144:   41 9e 00 10     beq-    cr7,154 <main+0xdc>
 148:   38 00 00 01     li      r0,1
 14c:   90 1f 00 40     stw     r0,64(r31)
 150:   48 00 00 0c     b       15c <main+0xe4>
 154:   38 00 00 00     li      r0,0
 158:   90 1f 00 40     stw     r0,64(r31)
 15c:   80 1f 00 40     lwz     r0,64(r31)
 160:   7c 03 03 78     mr      r3,r0
 164:   81 61 00 00     lwz     r11,0(r1)
 168:   80 0b 00 04     lwz     r0,4(r11)
 16c:   7c 08 03 a6     mtlr    r0
 170:   83 eb ff fc     lwz     r31,-4(r11)
 174:   7d 61 5b 78     mr      r1,r11
 178:   4e 80 00 20     blr

From Section 3.3.1 Alignment and Misaligned Accesses

The operand of a single-register memory access instruction has a natural 
alignment boundary equal to the operand length.  The "natural" address of an 
operand is an integral multiple of the operand length, ......

I can understand what the compiler is trying to achieve here in the sense of 
doing two loads/stores versus three, however it is performing misaligned 
loads/stores as a result.  This optimization actually becomes a performance hit 
if the underlying system is forced to perform the exception handling and piece 
the parts together.

Perhaps it's not a "real" bug, however not being able to override this behavior 
probably is.

This behavior has been observed in 4.0.1, 4.0.0, 3.4.4 and 3.3.1
Comment 1 Eric McVicker 2005-08-23 21:44:53 UTC
I should also mention that the target processor for this is the 603, not 603e 
or otherwise.  Even compiling with the -mtune=603 and -mcpu=603 gives the same 
output.  And those old processors do not handle mis-aligned short references in 
hardware.
Comment 2 Eric McVicker 2005-08-23 22:24:14 UTC
The data access exception is incorrect in this sense.  The software developer 
had updated the status of this issue and states that this causes a Machine 
Check exception to occur on our current hardware.

The observation made earlier that padding the structure with a byte clearing up 
the issue is still a factual statement.  We will have to discuss this with our 
hardware engineers as to why this occurs, however the issue is the same, since 
this hardware cannot be modified.  The software is for space based satellites 
which have already been launched.
Comment 3 Andrew Pinski 2005-08-23 22:30:01 UTC
GCC assumes you have unaligned access.

Use -mstrict-align if you want to assume unaligned access does not work.
Comment 4 Steven Bosscher 2005-08-23 23:35:07 UTC
Re. comment #3, you can find a whole load of options to control various things 
about gcc's powerpc backend in the manual.  For example in the manual for GCC 
4.0.1, you can give this page a look: 
http://gcc.gnu.org/onlinedocs/gcc-4.0.1/gcc/RS_002f6000-and-PowerPC-Options.html#RS_002f6000-and-PowerPC-Options 
 
-mstrict-align is there too: 
 
-mno-strict-align 
-mstrict-align 
On System V.4 and embedded PowerPC systems do not (do) assume that unaligned 
memory references will be handled by the system.  
 
Comment 5 Eric McVicker 2005-08-23 23:47:31 UTC
Our Hardware engineers came back to us informing us as to why this _may_ be an 
issue.  The hardware has a memory bus arbiter ASIC that does not handle mis-
aligned references for 2-byte accesses ending on 1 or 5.  So this construct 
will fail indeterminately.

I passed on the information given us about the -mstrict-align and it appears to 
have cleared up the issue with odd alignment offsets.

Thanks Much!
Comment 6 Eric McVicker 2005-08-24 00:20:36 UTC
Unfortunately this still appears to be some sort of bug.  The solution given 
with the -mstrict-align worked for the test case, but in the specific case 
here, still fails.  Attached is the output of the link command given.  The 
compiler invocation for all of the associated files that make the libraries, 
and object files all contain the -mstrict-align.

See the text below...


(1)> ppcg++ -gstabs+ -O1 -fno-guess-branch-probability -mcpu=603e -mtune=603e -
mstrict-align -nostdlib -DGH_COMPAT -DAVOID_EMPTY_CLASSES -DSVRT_TRACE -fno-use-
cxa-atexit -fcheck-new -fno-rtti -fno-exceptions -Wabi -
Wextra /export/GNU_port/MM.1/lib/sac/begin.o /export/GNU_port/MM.1/lib/sac/sysin
it.o /export/GNU_port/MM.1/lib/sac/prepccfg.o /export/GNU_port/MM.1/lib/sac/psos
cfg.o /export/GNU_port/MM.1/lib/sac/drv_conf.o  --no-undefined -Bstatic --warn-
section-align --demangle --warn-once --warn-common --sort-common -T sac.cmd -
L /export/GNU_port/MM.1/lib -L /export/GNU_port/MM.1/lbcs/lib \
-o /export/GNU_port/MM.1/bins/sac.x

-------------------------------------------------------------------

(2)> ppcobjdump -C -S sac.x | grep sth | less

  494108:       b3 43 00 18     sth     r26,24(r3)
  49410c:       b3 23 00 1a     sth     r25,26(r3)
  494110:       b3 03 00 1c     sth     r24,28(r3)
  494c0c:       b0 01 00 21     sth     r0,33(r1)       <-- Still ODD alignment
  4954a0:       b1 3a 30 ec     sth     r9,12524(r26)
  4956ac:       b1 23 00 04     sth     r9,4(r3)
  4957d4:       b0 03 00 a4     sth     r0,164(r3)
  4959a8:       b1 23 00 00     sth     r9,0(r3)
  495bac:       b0 03 00 a4     sth     r0,164(r3)
  495ed0:       b1 2a 30 e0     sth     r9,12512(r10)

(3)>
Comment 7 Andrew Pinski 2005-08-24 00:27:44 UTC
Of course not having the source does not help.
Comment 8 Eric McVicker 2005-08-24 02:57:06 UTC
I understand this frustration.  The source code is proprietary material so I 
cannot post it.  However we are working on developing a sample case to 
demonstrate what is happening.
Comment 9 Eric McVicker 2005-08-24 15:44:30 UTC
Here is a short program that duplicates the problem.

------------------ test.cc ------------------------
struct foo {
   char bar1;
   char bar2;
   char bar3;
};


class bar2 {
private:
   static foo   myFubarStruct;

public:

   static void fbs(foo fubarStruct);
};

foo bar2::myFubarStruct;

class bar1 {
private:
   int rVal1;
   int rVal2;

public:
   void doFoo(void);
};

void bar1::doFoo(void) {

   foo barStruct;

   barStruct.bar1 = 'A';
   barStruct.bar2 = 'B';
   barStruct.bar3 = 'C';

   bar2::fbs(barStruct);
}

void bar2::fbs(foo fubarStruct) {
   myFubarStruct = fubarStruct;
}


int main(int argc, char **argv) {

   bar1   baar;

   baar.doFoo();

   return (0);
}

------------------ end of test.cc ------------------------------

The test app was just compiled to object code as shown below with the -mstrict-
align option.

(2)> ppcg++ -mstrict-align -c test.cc

Then an objdump was done to dump the assembly (shown below)

(3)> ppcobjdump -C -S test.o

test.o:     file format elf32-powerpc

Disassembly of section .text:

00000000 <bar2::fbs(foo)>:
   0:   94 21 ff f0     stwu    r1,-16(r1)
   4:   93 e1 00 0c     stw     r31,12(r1)
   8:   7c 3f 0b 78     mr      r31,r1
   c:   7c 6b 1b 78     mr      r11,r3
  10:   3d 20 00 00     lis     r9,0
  14:   39 29 00 00     addi    r9,r9,0
  18:   88 0b 00 00     lbz     r0,0(r11)
  1c:   89 4b 00 01     lbz     r10,1(r11)
  20:   89 6b 00 02     lbz     r11,2(r11)
  24:   98 09 00 00     stb     r0,0(r9)
  28:   99 49 00 01     stb     r10,1(r9)
  2c:   99 69 00 02     stb     r11,2(r9)
  30:   81 61 00 00     lwz     r11,0(r1)
  34:   83 eb ff fc     lwz     r31,-4(r11)
  38:   7d 61 5b 78     mr      r1,r11
  3c:   4e 80 00 20     blr

00000040 <bar1::doFoo()>:
  40:   94 21 ff c8     stwu    r1,-56(r1)
  44:   7c 08 02 a6     mflr    r0
  48:   93 e1 00 34     stw     r31,52(r1)
  4c:   90 01 00 3c     stw     r0,60(r1)
  50:   7c 3f 0b 78     mr      r31,r1
  54:   90 7f 00 18     stw     r3,24(r31)
  58:   38 00 00 41     li      r0,65
  5c:   98 1f 00 0b     stb     r0,11(r31)
  60:   38 00 00 42     li      r0,66
  64:   98 1f 00 0c     stb     r0,12(r31)
  68:   38 00 00 43     li      r0,67
  6c:   98 1f 00 0d     stb     r0,13(r31)
  70:   88 1f 00 0b     lbz     r0,11(r31)
  74:   89 3f 00 0c     lbz     r9,12(r31)
  78:   89 7f 00 0d     lbz     r11,13(r31)
  7c:   98 1f 00 08     stb     r0,8(r31)
  80:   99 3f 00 09     stb     r9,9(r31)
  84:   99 7f 00 0a     stb     r11,10(r31)
  88:   88 1f 00 08     lbz     r0,8(r31)
  8c:   a1 3f 00 09     lhz     r9,9(r31)       <--- Odd alignment
  90:   98 1f 00 20     stb     r0,32(r31)
  94:   b1 3f 00 21     sth     r9,33(r31)      <--- Odd alignment
  98:   38 1f 00 20     addi    r0,r31,32
  9c:   7c 03 03 78     mr      r3,r0
  a0:   48 00 00 01     bl      a0 <bar1::doFoo()+0x60>
  a4:   81 61 00 00     lwz     r11,0(r1)
  a8:   80 0b 00 04     lwz     r0,4(r11)
  ac:   7c 08 03 a6     mtlr    r0
  b0:   83 eb ff fc     lwz     r31,-4(r11)
  b4:   7d 61 5b 78     mr      r1,r11
  b8:   4e 80 00 20     blr

000000bc <main>:
  bc:   94 21 ff d8     stwu    r1,-40(r1)
  c0:   7c 08 02 a6     mflr    r0
  c4:   93 e1 00 24     stw     r31,36(r1)
  c8:   90 01 00 2c     stw     r0,44(r1)
  cc:   7c 3f 0b 78     mr      r31,r1
  d0:   90 7f 00 18     stw     r3,24(r31)
  d4:   90 9f 00 1c     stw     r4,28(r31)
  d8:   48 00 00 01     bl      d8 <main+0x1c>
  dc:   38 7f 00 08     addi    r3,r31,8
  e0:   48 00 00 01     bl      e0 <main+0x24>
  e4:   38 00 00 00     li      r0,0
  e8:   7c 03 03 78     mr      r3,r0
  ec:   81 61 00 00     lwz     r11,0(r1)
  f0:   80 0b 00 04     lwz     r0,4(r11)
  f4:   7c 08 03 a6     mtlr    r0
  f8:   83 eb ff fc     lwz     r31,-4(r11)
  fc:   7d 61 5b 78     mr      r1,r11
 100:   4e 80 00 20     blr
(4)>
Comment 10 David Edelsohn 2005-08-26 23:55:03 UTC
Confirmed.
Comment 11 David Edelsohn 2005-08-26 23:55:57 UTC
rs6000.c:expand_block_move() is losing the alignment because of a typo/thinko in
the decision tree.
Comment 12 David Edelsohn 2005-08-26 23:56:49 UTC
Created attachment 9595 [details]
Maintain alignment
Comment 13 CVS Commits 2005-08-27 15:44:33 UTC
Subject: Bug 23539

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	dje@gcc.gnu.org	2005-08-27 15:44:28

Modified files:
	gcc            : ChangeLog 
	gcc/config/rs6000: rs6000.c 

Log message:
	PR target/23539
	* config/rs6000/rs6000.c (expand_block_clear): Use HImode when
	bytes >= 2 not bytes == 2.
	(expand_block_move): Same.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.9838&r2=2.9839
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/rs6000/rs6000.c.diff?cvsroot=gcc&r1=1.862&r2=1.863

Comment 14 CVS Commits 2005-08-27 15:46:53 UTC
Subject: Bug 23539

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-4_0-branch
Changes by:	dje@gcc.gnu.org	2005-08-27 15:46:45

Modified files:
	gcc            : ChangeLog 
	gcc/config/rs6000: rs6000.c 

Log message:
	PR target/23539
	* config/rs6000/rs6000.c (expand_block_clear): Use HImode when
	bytes >= 2 not bytes == 2.
	(expand_block_move): Same.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=2.7592.2.396&r2=2.7592.2.397
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/rs6000/rs6000.c.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.788.2.9&r2=1.788.2.10

Comment 15 CVS Commits 2005-08-31 14:29:09 UTC
Subject: Bug 23539

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	dje@gcc.gnu.org	2005-08-31 14:28:46

Modified files:
	gcc            : ChangeLog 
	gcc/config/rs6000: rs6000.c 

Log message:
	PR target/23539
	Backport from mainline:
	
	2005-08-27  David Edelsohn  <edelsohn@gnu.org>
	* config/rs6000/rs6000.c (expand_block_move): Use HImode when
	bytes >= 2 not bytes == 2.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.909&r2=2.2326.2.910
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/rs6000/rs6000.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.576.2.45&r2=1.576.2.46

Comment 16 David Edelsohn 2005-08-31 15:07:58 UTC
Patch applied to mainline.  Backported to 3.4 and 4.0 branches.