This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Stack Alignment


>Unfortunately sometimes the data wasn't 8-byte aligned and peformance
>suffered so I did the following:
>{
>    double array1[SIZE];
>    double *array;
>    if((val = ((unsigned long)array1 & 0x7)) != 0)
>        array = (double *)((unsigned long)array1 + 8-val);
>	else
>		array = array1;
>	compute(array);
>}
>
>Surprisingly this gave me uniformly bad performance, although it
>seems the data was properly aligned.  Also with egcc-2.91 using the
>-malign-double flag seemed to make the code slower.

That can't be the code you used, because `val' is undeclared.  Further,
while `compute' might indeed have received aligned pointers, it might
have been getting an array one element shorter than `SIZE', so who knows
what "performance" problems might have resulted from that bug.

So, in a case like this, where you want a pointer to the first or second
half element of an allocated array to achieve the alignment you want, you
need to allocate *one* more element (to make the second-half-element case
work) than you'd normally need.  I.e. use `double array1[SIZE+1]'.

Also, it's usually faster to not bother testing than to test so as to
save an integer add and boolean AND operation.  Though that probably
wouldn't explain performance problems you were seeing, I've normally
used code like:

  array = (double *) ((((offset_t) array1) + 0x4) & ~(offset_t) 0x7)

(Maybe `offset_t' isn't the right type to use...though portability is
less of a concern for code like this, make sure you don't cause the
code to break anywhere that a straight `array = array1;' would work.
Also, I prefer parenthesizing to expecting the reader to remember whether
a cast has higher precedence than `+', etc.  Don't trust my code,
just use it to get an idea of how to avoid the `if'.)

Finally, though, it might be the case that 2.95 is giving you better
than 64-bit alignment and that's needed for the processor you're using.
It really should be the compiler/linker/loader/OS that ensure reasonable
default alignment for data in inner loops, rather than the user's code,
given the proliferation of chips with different performance characteristics.
Of course, reality is different from the ideal....

        tq vm, (burley)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]