[PATCH][store merging][RFA] Re-implement merging code
Kyrill Tkachov
kyrylo.tkachov@foss.arm.com
Mon Oct 10 11:06:00 GMT 2016
On 10/10/16 11:22, Richard Biener wrote:
> On Mon, 10 Oct 2016, Kyrill Tkachov wrote:
>
>> Hi Richard,
>>
>> As I mentioned, here is the patch applying to the main store merging patch to
>> re-implement encode_tree_to_bitpos
>> to operate on the bytes directly.
>>
>> This works fine on little-endian but breaks on big-endian, even for merging
>> bitfields within a single byte.
>> Consider the code snippet from gcc.dg/store_merging_6.c:
>>
>> struct bar {
>> int a : 3;
>> unsigned char b : 4;
>> unsigned char c : 1;
>> char d;
>> char e;
>> char f;
>> char g;
>> };
>>
>> void
>> foo1 (struct bar *p)
>> {
>> p->b = 3;
>> p->a = 2;
>> p->c = 1;
>> p->d = 4;
>> p->e = 5;
>> }
>>
>> The correct GIMPLE for these merged stores on big-endian is:
>> MEM[(voidD.49 *)p_2(D)] = 18180;
>> MEM[(charD.8 *)p_2(D) + 2B] = 5;
>>
>> whereas with this patch we emit:
>> MEM[(voidD.49 *)p_2(D)] = 39428;
>> MEM[(charD.8 *)p_2(D) + 2B] = 5;
>>
>> The dump for merging the individual stores without this patch (using the
>> correct but costly wide_int approach in the base patch) is:
>> After writing 3 of size 4 at position 3 the merged region contains:
>> 6 0 0 0 0 0
>> After writing 2 of size 3 at position 0 the merged region contains:
>> 46 0 0 0 0 0
>> After writing 1 of size 1 at position 7 the merged region contains:
>> 47 0 0 0 0 0
>> After writing 4 of size 8 at position 8 the merged region contains:
>> 47 4 0 0 0 0
>> After writing 5 of size 8 at position 16 the merged region contains:
>> 47 4 5 0 0 0
>>
>>
>> And with this patch it is:
>> After writing 3 of size 4 at position 3 the merged region contains:
>> 18 0 0 0 0 0
>> After writing 2 of size 3 at position 0 the merged region contains:
>> 1a 0 0 0 0 0
>> After writing 1 of size 1 at position 7 the merged region contains:
>> 9a 0 0 0 0 0
>> After writing 4 of size 8 at position 8 the merged region contains:
>> 9a 4 0 0 0 0
>> After writing 5 of size 8 at position 16 the merged region contains:
>> 9a 4 5 0 0 0
>>
>> (Note the dump just dumps the byte array from index 0 to <len> so the first
>> thing printed is the lowest numbered byte.
>> Also, each byte is dumped in hex.)
>>
>> The code as included here doesn't do any byte swapping for big-endian but as
>> seen from the dump even writing a sub-byte
>> bitfield goes wrong so it would be nice to resolve that before going forward.
>> Any help with debugging this is hugely appreciated. I've included an ASCII
>> diagram of the steps in the algorithm
>> in the patch itself.
> Ah, I think you need to account for BITS_BIG_ENDIAN in
> shift_bytes_in_array. You have to shift towards MSB which means changing
> left to right shifts for BITS_BIG_ENDIAN.
Thanks, I'll try it out. But this is on aarch64 where
BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1
so there's something else bad here.
> You also seem to miss to account for amnt / BITS_PER_UNIT != 0.
> Independently of BYTES_BIG_ENDIAN it would be
>
> ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt;
> ...
doh, yes. I'll fix that.
> (so best use a single load / store and operate on a temporary).
Thanks,
Kyrill
> Richard.
>
>> Thanks,
>> Kyrill
>>
More information about the Gcc-patches
mailing list