This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: sparse overlapping structs for vectorization


On Wed, Feb 12, 2014 at 7:21 AM, Albert Cahalan <acahalan@gmail.com> wrote:
> I had a problem that got solved in an ugly way. I think gcc ought
> to provide a few ways to make a nicer solution.
>
> There was an array of structs roughly like so:
>
> struct{int w;float x;char y[4];short z[2];}foo[512][4];
>
> The types within the struct are 4 bytes each; I don't actually
> remember anything else and it doesn't matter except that they
> are distinct. I think it was bitfields actually, neatly grouped
> into groups of 32 bits. In other words, like 4 4-byte values
> but with more-or-less incompatible types.
>
> Note that 4 of the structs neatly fill a 64-byte cache line.
> An alignment attribute was used to ensure 64-byte alignment.
>
> The most common operation needed on this array is to compare
> the first struct member of 4 of the structs against a given
> value, looking to see if there is a match. SSE would be good.
> This would then be followed by using the matching entry if
> there is one, else picking one of the 4 to recycle and thus use.
>
> First bad solution:
>
> One could load up 4 SSE registers, shuffle things around... NO.
>
> Second bad solution:
>
> One could simply have 4 distinct arrays. This is bad because
> there are different cache lines for w, x, y, and z.
>
> Third bad solution:
>
> The array can be viewed as "int foo[512][4][4]" instead, with
> the struct forming the third array index. Note that the last two
> array indexes are both 4, so you can kind of swap them around.
> This groups 4 fields of each type together, allowing SSE. The
> problem here is loss of type safety; one must use array indexes
> instead of struct field names. Like so: foo[idx][WHERE_W_IS][i]
>
> Fourth bad solution:
>
> We lay things out as in the third solution, but we cast pointers
> to effectively lay sparse structs over each other like shingles.
> {
> int w;
> int pad_wx[3];
> float x;
> int pad_xy[3];
> char y[4];
> int pad_yz[3];
> short z[2];
> }
> Performance is hurt by the need for __may_alias__ and of course
> the result is painful to look at. We went with this anyway, using
> SSE intrinsics, and performance was great. Maintainability... not
> so much.
>
> BTW, an array of 512 structs containing 4-entry arrays was not used
> because we wanted to have a simple normal pointer to indicate the
> item being operated on. We didn't want to need a pointer,index pair.
>
> Can something be done to help out here? The first thing that pops
> into mind is the ability to tell gcc that the struct-to-struct
> byte offset for array indexing is a user-specified value instead
> of simply the struct size.
>
> It's possible we could have safely ignored the warning about aliasing.
> I don't know. Perhaps that would give even better performance, but
> the casting would still be very ugly.
>
> Solutions that that be defined away for non-gcc compilers are better.

Do the overlay but use an overlay of type char[<large enough>] and
load from that.  Should be more maintainable than using may_alias
and also work with other compilers.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]