Hi, After looking at the RTL dumps here and there, I've noticed that GCC generates unaligned loads in the following manner: foo = ([reg:X & -8] << (64 - (((reg:FP+reg:Y) & 0x7) << 3))) >> 56 Obviously, we're losing every eighth byte here (consider FP-0,FP-8). Since FP is aligned, GCC optimizes this to foo = 0. The right way to do this is: foo = ([reg:X & -8] << (56 - (((reg:FP+reg:Y-1) & 0x7) << 3))) >> 56 To reproduce, try this out: unsigned long foo = 0x010203040a0b0c0d; printf("%02x", *((char *)&foo + 7)); With -O (and onwards) it will turn out to be zero; at every &X+15, &X+23, etc. Depending on the offset from the frame pointer. Attached the diff against trunk. Martynas.
Created attachment 29845 [details] gcc-unalign-alpha.diff
> Attached the diff against trunk. Please post patches to gcc-patches mailing list, as described in [1]. [1] http://gcc.gnu.org/contribute.html
(In reply to comment #0) > To reproduce, try this out: > > unsigned long foo = 0x010203040a0b0c0d; > printf("%02x", *((char *)&foo + 7)); > > With -O (and onwards) it will turn out to be zero; at every &X+15, > &X+23, etc. Depending on the offset from the frame pointer. Please create a self-sufficient executable testcase, following the instructions at [1]. I was not able to confirm the problem from the lines you posted. [1] http://gcc.gnu.org/bugs/
Hi, (In reply to comment #3) > Please create a self-sufficient executable testcase, following the instructions > at [1]. I was not able to confirm the problem from the lines you posted. Thanks for the feedback, Uros. Did you try it together with the frame growing downwards diff posted in #56898? If so, the locals are actually at the negative offsets and unaligned loads like foo%8-5 will expose this, instead of foo%8-1. I'm attaching the ab-pre.tgz (before the diff) and ab-post.tgz (after the diff) which exercise both frame growing upwards and downwards; RTL dumps included. Feel free to turn it into a testcase (I can't do that at the moment). I'm 100% busy at work this week, so I would appreciate if you took care of this (and #56898). Otherwise, I'll follow up with the official contribution guidelines in the weekend. Martynas.
Created attachment 29878 [details] ab-pre.tgz > gcc a.c; ./a.out 0d0a 0d0a0401 0d0a0401 > gcc -O a.c; ./a.out 0d0a 0d0a0400 0d0a0400 > cp a.c b.c > gcc -S -dall a.c > gcc -O -S -dall b.c
Created attachment 29879 [details] ab-post.tgz > gcc a.c; ./a.out 0d0a 0d0a0401 0d0a0401 > gcc -O a.c; ./a.out 0d0a 0d0a0401 0d0a0401 > cp a.c b.c > gcc -S -dall a.c > gcc -O -S -dall b.c
(In reply to comment #4) > Hi, > > (In reply to comment #3) > > Please create a self-sufficient executable testcase, following the instructions > > at [1]. I was not able to confirm the problem from the lines you posted. > > Thanks for the feedback, Uros. Did you try it together with the frame > growing downwards diff posted in #56898? If so, the locals are actually > at the negative offsets and unaligned loads like foo%8-5 will expose this, > instead of foo%8-1. No, I am using unpatched compiler. Compiling your ab-pre.tgz test, I got: ~/gcc-build-47/gcc/xgcc -B ~/gcc-build-47/gcc -O a.c uros@monolith ~/test $ ./a.out 0d0a 0d0a0401 0d0a0401 ~/gcc-build-47/gcc/xgcc -B ~/gcc-build-47/gcc -O -mcpu=ev4 a.c uros@monolith ~/test $ ./a.out 0d0a 0d0a0401 0d0a0401 with: GNU C (GCC) version 4.7.3 20130228 (prerelease) [gcc-4_7-branch revision 196343] (alphaev68-unknown-linux-gnu) and the same result (with the same flags) with: GNU C (GCC) version 4.9.0 20130407 (experimental) [trunk revision 197551] (alphaev68-unknown-linux-gnu) The compilers imply -mcpu=ev67 when invoked without -mcpu command.