This is the mail archive of the
mailing list for the GCC project.
Re: Redundant loads for bitfield accesses
- From: Michael Clark <michaeljclark at mac dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Thu, 17 Aug 2017 10:52:53 +1200
- Subject: Re: Redundant loads for bitfield accesses
- Authentication-results: sourceware.org; auth=none
- References: <B90C4159-A1D6-4957-901D-F9AD286AB50E@mac.com> <CA+=Sn1m+S2LmZmwk9iaJiTm_FVY2QBd1RqKaE+2CTFjmb3xn_w@mail.gmail.com>
> On 17 Aug 2017, at 10:41 AM, Andrew Pinski <email@example.com> wrote:
> On Wed, Aug 16, 2017 at 3:29 PM, Michael Clark <firstname.lastname@example.org> wrote:
>> Is there any reason for 3 loads being issued for these bitfield accesses, given two of the loads are bytes, and one is a half; the compiler appears to know the structure is aligned at a half word boundary. Secondly, the riscv code is using a mixture of 32-bit and 64-bit adds and shifts. Thirdly, with -Os the riscv code size is the same, but the schedule is less than optimal. i.e. the 3rd load is issued much later.
> Well one thing is most likely SLOW_BYTE_ACCESS is set to 0. This
> forces byte access for bit-field accesses. The macro is misnamed now
> as it only controls bit-field accesses right now (and one thing in
> dojump dealing with comparisons with and and a constant but that might
> be dead code). This should allow for you to get the code in hand
> written form.
> I suspect SLOW_BYTE_ACCESS support should be removed and be assumed to
> be 1 but I have not time to look into each backend to see if it is
> correct to do or not. Maybe it is wrong for AVR.
Thanks, that’s interesting.
So I should try compiling the riscv backend with SLOW_BYTE_ACCESS = 1? Less risk than making a change to x86.
This is clearly distinct from slow unaligned access. It seems odd that O3 doesn’t coalesce loads even if byte access is slow as one would expect the additional cost of the additional loads would outweigh the fact that byte accesses are not slow unless something weird is happening with the costs of loads of different widths.
x86 could also be helped here too. I guess subsequent loads will be served from L1, but that’s not really an excuse for this codegen when the element is 32-bits aligned (unsigned int).