This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

sparse overlapping structs for vectorization


I had a problem that got solved in an ugly way. I think gcc ought
to provide a few ways to make a nicer solution.

There was an array of structs roughly like so:

struct{int w;float x;char y[4];short z[2];}foo[512][4];

The types within the struct are 4 bytes each; I don't actually
remember anything else and it doesn't matter except that they
are distinct. I think it was bitfields actually, neatly grouped
into groups of 32 bits. In other words, like 4 4-byte values
but with more-or-less incompatible types.

Note that 4 of the structs neatly fill a 64-byte cache line.
An alignment attribute was used to ensure 64-byte alignment.

The most common operation needed on this array is to compare
the first struct member of 4 of the structs against a given
value, looking to see if there is a match. SSE would be good.
This would then be followed by using the matching entry if
there is one, else picking one of the 4 to recycle and thus use.

First bad solution:

One could load up 4 SSE registers, shuffle things around... NO.

Second bad solution:

One could simply have 4 distinct arrays. This is bad because
there are different cache lines for w, x, y, and z.

Third bad solution:

The array can be viewed as "int foo[512][4][4]" instead, with
the struct forming the third array index. Note that the last two
array indexes are both 4, so you can kind of swap them around.
This groups 4 fields of each type together, allowing SSE. The
problem here is loss of type safety; one must use array indexes
instead of struct field names. Like so: foo[idx][WHERE_W_IS][i]

Fourth bad solution:

We lay things out as in the third solution, but we cast pointers
to effectively lay sparse structs over each other like shingles.
{
int w;
int pad_wx[3];
float x;
int pad_xy[3];
char y[4];
int pad_yz[3];
short z[2];
}
Performance is hurt by the need for __may_alias__ and of course
the result is painful to look at. We went with this anyway, using
SSE intrinsics, and performance was great. Maintainability... not
so much.

BTW, an array of 512 structs containing 4-entry arrays was not used
because we wanted to have a simple normal pointer to indicate the
item being operated on. We didn't want to need a pointer,index pair.

Can something be done to help out here? The first thing that pops
into mind is the ability to tell gcc that the struct-to-struct
byte offset for array indexing is a user-specified value instead
of simply the struct size.

It's possible we could have safely ignored the warning about aliasing.
I don't know. Perhaps that would give even better performance, but
the casting would still be very ugly.

Solutions that that be defined away for non-gcc compilers are better.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]