[PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Vineet Gupta vineetg@rivosinc.com
Tue Nov 2 22:04:24 GMT 2021

On 11/2/21 2:18 PM, Christoph Müllner wrote:
> On Tue, Nov 2, 2021 at 9:15 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>> On 11/2/21 1:09 PM, Christoph Müllner wrote:
>>>>>> Without overlap_op_by_pieces we get:
>>>>>>      8e:   00053023                sd      zero,0(a0)
>>>>>>      92:   00052423                sw      zero,8(a0)
>>>>>>      96:   00051623                sh      zero,12(a0)
>>>>>>      9a:   00050723                sb      zero,14(a0)
>>>> To generate even the non optimized code above with gcc 11 [1][2], what
>>>> do I need to do. Despite -mno-strict-align and trying -mtune={rocket,
>>>> sifive-7-series}, I only get the fully unrolled version
>>> You need a tuning struct with slow_unaligned_access == false.
>>> Both, Rocket and Sifive 7, have slow unaligned access set to true.
>>> Mainline you have thead-c906 which would work.
>> But doesn't -mno-strict-align imply that ?
> Opposite direction.

Took me a while to unpack :-)

> With `-mno-strict-align` emitted code might contain unaligned accesses
> if `slow_unaligned_access == false`.
> If `slow_unaligned_access == false`, then `-mstrict-align` will
> prevent unaligned accesses.
> Usually, there is a good reason why `slow_unaliged_access` is set to
> `true` (e.g. a significant penalty
> in case of unaligned accesses). It wouldn't make sense to overrule this.

Sure it makes sense since this is uarch fundamental.
Because of following snippet, unaligned access codegen can only be made 
more restrictive and not less (and really requires a compiler rebuild to 

   riscv_slow_unaligned_access_p = (cpu->tune_param->slow_unaligned_access
                    || TARGET_STRICT_ALIGN);


More information about the Gcc-patches mailing list