This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb



On 27/05/14 15:47, Ramana Radhakrishnan wrote:
On Tue, May 27, 2014 at 3:31 PM, Maciej W. Rozycki
<macro@codesourcery.com> wrote:
On Tue, 13 Aug 2013, Kyrylo Tkachov wrote:

On 08/09/13 11:01, Julian Brown wrote:
On Thu, 8 Aug 2013 15:44:17 +0100
Kyrylo Tkachov <kyrylo.tkachov@arm.com> wrote:

Hi all,

The recently added gcc.target/arm/pr58041.c test exposed a bug in the
backend. When compiling for NEON and with -mno-unaligned-access we
end up generating the vld1.64 and vst1.64 instructions instead of
doing the accesses one byte at a time like -mno-unaligned-access
expects. This patch fixes that by enabling the NEON expander and
insns that produce these instructions only when unaligned accesses
are allowed.

Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu.

Ok for trunk and 4.8?
I'm not sure if this is right, FWIW -- do the instructions in question
trap if the CPU is set to disallow unaligned accesses? I thought that
control bit only affected ARM core loads & stores, not NEON ones.
Thinking again - the ARM-ARM says - the alignment check is for element
size, so an alternative might be to use vld1.8 instead to allow for this
at which point we might as well do something else with the test. I note
that these patterns are not allowed for BYTES_BIG_ENDIAN so that might
be a better alternative than completely disabling it.
Looking at the section on unaligned accesses, it seems that the
ldrb/strb-class instructions are the only ones that are unaffected by the
SCTLR.A bit and do not produce alignment faults in any case.
The NEON load/store instructions, including vld1.8 can still cause an
alignment fault when SCTLR.A is set. So it seems we can only use the byte-wise
core memory instructions for unaligned data.
  This change however has regressed gcc.dg/vect/vect-72.c on the
arm-linux-gnueabi target, -march=armv5te, in particular in 4.8.
And what are all the configure flags you are using in case some one
has to reproduce this issue ?

Second that. My recently built 4.8 (gcc version 4.8.2 20130531) for vect-72 with options: -O2 -ftree-vectorize -march=armv5te -mfpu=neon -mfloat-abi=hard -fno-vect-cost-model -fno-common

gives code the same as your original one:
.L14:
        sub     r1, r3, #16
        add     r3, r3, #16
        vld1.8  {q8}, [r1]
        cmp     r3, r0
        vst1.64 {d16-d17}, [r2:64]!
        bne     .L14
        ldr     r3, .L22+12
        add     ip, r3, #128
        add     r2, r3, #129


Kyrill

  Beforehand the code fragment in question produced was:

.L14:
         sub     r1, r3, #16
         add     r3, r3, #16
         vld1.8  {q8}, [r1]
vld1 allows a misaligned load.

         cmp     r3, r0
         vst1.64 {d16-d17}, [r2:64]!
         bne     .L14

Afterwards it is:

.L14:
         vldr    d16, [r3, #-16]
         vldr    d17, [r3, #-8]
         add     r3, r3, #16
         cmp     r3, r1
         vst1.64 {d16-d17}, [r2:64]!
         bne     .L14

and the second VLDR instruction traps with SIGILL (the value in R3 is
0x10b29, odd as you'd expect, pointing into `ib').  I don't know why and
especially why only the second of the two (regrettably I've been unable to
track down an instruction reference that'd be detailed enough to specify
what exceptions VLDR can produce and under what conditions).
vldr will cause an unaligned access fault if the address is
misaligned. The question is why is the address misaligned in this
case.





  Is there a fix that needs backporting to 4.8 or is this an issue that was
unknown so far?
I haven't seen an issue with this so far.

Ramana

  Hardware and Linux used:

$ cat /proc/cpuinfo
Processor       : ARMv7 Processor rev 2 (v7l)
processor       : 0
BogoMIPS        : 2013.49

processor       : 1
BogoMIPS        : 1963.08

Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0xc09
CPU revision    : 2

Hardware        : OMAP4430 Panda Board
Revision        : 0020
Serial          : 0000000000000000
$ uname -a
Linux panda2 2.6.35-903-omap4 #14-Ubuntu SMP PREEMPT Wed Oct 6 17:23:24 UTC 2010 armv7l GNU/Linux
$

   Maciej



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]