[PATCH][store merging][RFA] Re-implement merging code

Mon Oct 10 11:06:00 GMT 2016

On 10/10/16 11:22, Richard Biener wrote:
> On Mon, 10 Oct 2016, Kyrill Tkachov wrote:
>
>> Hi Richard,
>>
>> As I mentioned, here is the patch applying to the main store merging patch to
>> re-implement encode_tree_to_bitpos
>> to operate on the bytes directly.
>>
>> This works fine on little-endian but breaks on big-endian, even for merging
>> bitfields within a single byte.
>> Consider the code snippet from gcc.dg/store_merging_6.c:
>>
>> struct bar {
>>    int a : 3;
>>    unsigned char b : 4;
>>    unsigned char c : 1;
>>    char d;
>>    char e;
>>    char f;
>>    char g;
>> };
>>
>> void
>> foo1 (struct bar *p)
>> {
>>    p->b = 3;
>>    p->a = 2;
>>    p->c = 1;
>>    p->d = 4;
>>    p->e = 5;
>> }
>>
>> The correct GIMPLE for these merged stores on big-endian is:
>>    MEM[(voidD.49 *)p_2(D)] = 18180;
>>    MEM[(charD.8 *)p_2(D) + 2B] = 5;
>>
>> whereas with this patch we emit:
>>    MEM[(voidD.49 *)p_2(D)] = 39428;
>>    MEM[(charD.8 *)p_2(D) + 2B] = 5;
>>
>> The dump for merging the individual stores without this patch (using the
>> correct but costly wide_int approach in the base patch) is:
>> After writing 3 of size 4 at position 3 the merged region contains:
>> 6 0 0 0 0 0
>> After writing 2 of size 3 at position 0 the merged region contains:
>> 46 0 0 0 0 0
>> After writing 1 of size 1 at position 7 the merged region contains:
>> 47 0 0 0 0 0
>> After writing 4 of size 8 at position 8 the merged region contains:
>> 47 4 0 0 0 0
>> After writing 5 of size 8 at position 16 the merged region contains:
>> 47 4 5 0 0 0
>>
>>
>> And with this patch it is:
>> After writing 3 of size 4 at position 3 the merged region contains:
>> 18 0 0 0 0 0
>> After writing 2 of size 3 at position 0 the merged region contains:
>> 1a 0 0 0 0 0
>> After writing 1 of size 1 at position 7 the merged region contains:
>> 9a 0 0 0 0 0
>> After writing 4 of size 8 at position 8 the merged region contains:
>> 9a 4 0 0 0 0
>> After writing 5 of size 8 at position 16 the merged region contains:
>> 9a 4 5 0 0 0
>>
>> (Note the dump just dumps the byte array from index 0 to <len> so the first
>> thing printed is the lowest numbered byte.
>> Also, each byte is dumped in hex.)
>>
>> The code as included here doesn't do any byte swapping for big-endian but as
>> seen from the dump even writing a sub-byte
>> bitfield goes wrong so it would be nice to resolve that before going forward.
>> Any help with debugging this is hugely appreciated. I've included an ASCII
>> diagram of the steps in the algorithm
>> in the patch itself.
> Ah, I think you need to account for BITS_BIG_ENDIAN in
> shift_bytes_in_array.  You have to shift towards MSB which means changing
> left to right shifts for BITS_BIG_ENDIAN.

Thanks, I'll try it out. But this is on aarch64 where
BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1
so there's something else bad here.

> You also seem to miss to account for amnt / BITS_PER_UNIT != 0.
> Independently of BYTES_BIG_ENDIAN it would be
>
>    ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt;
> ...

doh, yes. I'll fix that.

> (so best use a single load / store and operate on a temporary).

Thanks,
Kyrill

> Richard.
>
>> Thanks,
>> Kyrill
>>