Bug 82445 - ARM target generates unaligned STRD instruction
Summary: ARM target generates unaligned STRD instruction
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.1.1
: P3 normal
Target Milestone: 6.5
Assignee: Richard Earnshaw
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2017-10-06 09:10 UTC by Alexander Graf
Modified: 2017-10-30 15:22 UTC (History)
1 user (show)

See Also:
Host:
Target: arm
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-10-06 00:00:00


Attachments
Reproducer (589 bytes, text/x-csrc)
2017-10-06 09:10 UTC, Alexander Graf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Graf 2017-10-06 09:10:58 UTC
Created attachment 42310 [details]
Reproducer

The 32-bit ARM target in gcc generates an STRD instruction at not-multiple-of-8 offsets even with -no-unaligned-access specified. This currently breaks grub on ARM, as it runs with alignment faults enabled.

This is also a regression introduced somewhere recently (gcc7?), as I haven't seen these unaligned accesses generated before.

I've also attached a small reproducer for the issue.
Comment 1 Richard Biener 2017-10-06 09:56:34 UTC
Does -fno-store-merging fix it?

store-merging currently does

      unsigned HOST_WIDE_INT align = split_store->align;
      tree offset_type = get_alias_type_for_stmts (split_store->orig_stmts);
      location_t loc = get_location_for_stmts (split_store->orig_stmts);

      tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
      int_type = build_aligned_type (int_type, align);
      tree dest = fold_build2 (MEM_REF, int_type, addr,
                               build_int_cst (offset_type, try_pos));

and expects the target to cope with this.  But IIRC RTL expansion only
ever "honors" this if either it has a movmisalign optab for the mode
or it is SLOW_UNALIGNED_ACCESS (ok, arm seems to be SLOW_UNALIGNED_ACCESS).
So this means we go the bitfield store way (in the end this means store-merging
lacks proper costing for strict-align targets).  That may end up using
all sorts of patterns, like insv and friends.  Not sure what happens in the
end.
Comment 2 Alexander Graf 2017-10-06 10:02:24 UTC
(In reply to Richard Biener from comment #1)
> Does -fno-store-merging fix it?

Yes, but it generates worse code than -march=armv5 (which does not support STRD) does:

-march=armv6 -fno-store-merging:
> 00000000 <x>:
>    0:	e59f3028 	ldr	r3, [pc, #40]	; 30 <x+0x30>
>    4:	e3a02008 	mov	r2, #8
>    8:	e3a0c010 	mov	ip, #16
>    c:	e3a00000 	mov	r0, #0
>   10:	e3e01002 	mvn	r1, #2
>   14:	e5933000 	ldr	r3, [r3]
>   18:	e1c3c0b6 	strh	ip, [r3, #6]
>   1c:	e1c300b8 	strh	r0, [r3, #8]
>   20:	e1c310ba 	strh	r1, [r3, #10]
>   24:	e1c320b4 	strh	r2, [r3, #4]
>   28:	e1c320bc 	strh	r2, [r3, #12]
>   2c:	e12fff1e 	bx	lr
>   30:	00000000 	.word	0x00000000

-march=armv5:
> 00000000 <x>:
>    0:	e59f3018 	ldr	r3, [pc, #24]	; 20 <x+0x20>
>    4:	e3a02008 	mov	r2, #8
>    8:	e59f0014 	ldr	r0, [pc, #20]	; 24 <x+0x24>
>    c:	e59f1014 	ldr	r1, [pc, #20]	; 28 <x+0x28>
>   10:	e5933000 	ldr	r3, [r3]
>   14:	e9830003 	stmib	r3, {r0, r1}
>   18:	e1c320bc 	strh	r2, [r3, #12]
>   1c:	e12fff1e 	bx	lr
>   20:	00000000 	.word	0x00000000
>   24:	00100008 	.word	0x00100008
>   28:	fffd0000 	.word	0xfffd0000
Comment 3 Richard Biener 2017-10-06 10:08:02 UTC
To answer myself, yes, -fno-store-merging fixes it.  RTL expansion is ok:

;; MEM[(void *)unknown_glyph.0_1 + 4B] = 1048584;

(insn 7 6 8 (set (reg:SI 112)
        (const_int 1048584 [0x100008])) "strd.c":50 -1
     (nil))

(insn 8 7 0 (set (mem:SI (plus:SI (reg/f:SI 110 [ unknown_glyph.0_1 ])
                (const_int 4 [0x4])) [0 MEM[(void *)unknown_glyph.0_1 + 4B]+0 S4 A32])
        (reg:SI 112)) "strd.c":50 -1
     (nil))

;; MEM[(void *)unknown_glyph.0_1 + 8B] = 4294770688;

(insn 9 8 10 (set (reg:SI 113)
        (const_int -196608 [0xfffffffffffd0000])) "strd.c":51 -1
     (nil))

(insn 10 9 0 (set (mem:SI (plus:SI (reg/f:SI 110 [ unknown_glyph.0_1 ])
                (const_int 8 [0x8])) [0 MEM[(void *)unknown_glyph.0_1 + 8B]+0 S4 A32])
        (reg:SI 113)) "strd.c":51 -1
     (nil))

;; MEM[(void *)unknown_glyph.0_1 + 12B] = 8;

(insn 11 10 12 (set (reg:SI 115)
        (const_int 8 [0x8])) "strd.c":53 -1
     (nil))

(insn 12 11 13 (set (reg:HI 114)
        (subreg:HI (reg:SI 115) 0)) "strd.c":53 -1
     (expr_list:REG_EQUAL (const_int 8 [0x8])
        (nil)))

(insn 13 12 0 (set (mem:HI (plus:SI (reg/f:SI 110 [ unknown_glyph.0_1 ])
                (const_int 12 [0xc])) [0 MEM[(void *)unknown_glyph.0_1 + 12B]+0 S2 A32])
        (reg:HI 114)) "strd.c":53 -1
     (nil))


The DI move is introduced by peephole2:

Splitting with gen_peephole2_12

which is ldrdstrd.md:

(define_peephole2 ; strd
  [(set (match_operand:SI 2 "memory_operand" "")
        (match_operand:SI 0 "arm_general_register_operand" ""))
   (set (match_operand:SI 3 "memory_operand" "")
        (match_operand:SI 1 "arm_general_register_operand" ""))]
  "TARGET_LDRD"
  [(const_int 0)]
{
  if (!gen_operands_ldrd_strd (operands, false, false, false))
    FAIL;
  else if (TARGET_ARM)
  {
    /* In ARM state, the destination registers of LDRD/STRD must be
       consecutive. We emit DImode access.  */
    operands[0] = gen_rtx_REG (DImode, REGNO (operands[0]));
    operands[2] = adjust_address (operands[2], DImode, 0);
    /* Emit [(set (match_dup 2) (match_dup 0))]  */
    emit_insn (gen_rtx_SET (operands[2], operands[0]));
    DONE;
  }
  else if (TARGET_THUMB2)
  {
    /* Emit the pattern:
       [(parallel [(set (match_dup 2) (match_dup 0))
                   (set (match_dup 3) (match_dup 1))])]  */
    rtx t1 = gen_rtx_SET (operands[2], operands[0]);
    rtx t2 = gen_rtx_SET (operands[3], operands[1]);
    emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2)));
    DONE;
  }
})

but this doesn't at all consider alignment of the MEMs it merges.
Comment 4 Richard Earnshaw 2017-10-09 10:58:58 UTC
looks like gen_operands_ldrd_strd should be checking for this and failing if the alignment is not suitable for the target architecture.
Comment 5 Petr Cvek 2017-10-18 10:22:03 UTC
Get the same bug with: gcc version 7.1.0 (crosstool-NG crosstool-ng-1.23.0-88-gfae66ae3) 

Bug free version: gcc version 6.3.0 (crosstool-NG crosstool-ng-1.22.0-452-gc7b1e295) 

Testing program: just main() with "helloworld" calling the example code (strd.c) compiled as object file.

march/mtune combinations which are OK:

-march=armv5te -mtune=xscale
-march=armv5te -mtune=iwmmxt
-march=iwmmxt
-march=armv5t (actually less opcodes than variants above)
-march=armv5

march/mtune combinations which generate an unaligned access in STRD (even with -mno-unaligned-access):

-march=armv5te (tested on a real HW, generating alignment exceptions in /proc/cpu/alignment)
-march=armv5tej (not existing on gcc 6.3.0)
-march=armv5te -mtune=arm1020e
-march=armv5te -mtune=arm926ej-s
Comment 6 Richard Earnshaw 2017-10-19 13:11:13 UTC
Author: rearnsha
Date: Thu Oct 19 13:10:42 2017
New Revision: 253890

URL: https://gcc.gnu.org/viewcvs?rev=253890&root=gcc&view=rev
Log:
[ARM] PR 82445 - suppress 32-bit aligned ldrd/strd peepholing with -mno-unaligned-access

Peephole patterns exist in the arm backend to spot load/store
operations to adjacent memory operations in order to convert them into
ldrd/strd instructions.  However, when we have strict alignment
enforced, then we can only do this if the accesses are known to be
64-bit aligned; this is unlikely to be the case for most loads.  The
patch adds some alignment checking to the code that validates the
addresses for use in the peephole patterns.  This should also fix
incorrect generation of ldrd/strd with unaligned accesses that could
previously have occurred on ARMv5e where all such operations must be
64-bit aligned.

I've added some new tests as well.  In doing so I discovered that the
ldrd/strd peephole tests could never fail since they would match the
source file name in the scanned assembly as well as any instructions
of the intended type.  I've fixed those by tightening the scan results
slightly.

gcc:

* config/arm/arm.c (align_ok_ldrd_strd): New function.
(mem_ok_for_ldrd_strd): New parameter align.  Extract the alignment of the
mem into it.
(gen_operands_ldrd_strd): Validate the alignment of the accesses.

testsuite:

* gcc.target/arm/peep-ldrd-1.c: Tighten test scan pattern.
* gcc.target/arm/peep-strd-1.c: Likewise.
* gcc.target/arm/peep-ldrd-2.c: New test.
* gcc.target/arm/peep-strd-2.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c
    trunk/gcc/testsuite/gcc.target/arm/peep-strd-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c
    trunk/gcc/testsuite/gcc.target/arm/peep-strd-1.c
Comment 7 Richard Earnshaw 2017-10-19 13:15:28 UTC
Author: rearnsha
Date: Thu Oct 19 13:14:55 2017
New Revision: 253891

URL: https://gcc.gnu.org/viewcvs?rev=253891&root=gcc&view=rev
Log:
[ARM] PR 82445 - suppress 32-bit aligned ldrd/strd peepholing with -mno-unaligned-access

Peephole patterns exist in the arm backend to spot load/store
operations to adjacent memory operations in order to convert them into
ldrd/strd instructions.  However, when we have strict alignment
enforced, then we can only do this if the accesses are known to be
64-bit aligned; this is unlikely to be the case for most loads.  The
patch adds some alignment checking to the code that validates the
addresses for use in the peephole patterns.  This should also fix
incorrect generation of ldrd/strd with unaligned accesses that could
previously have occurred on ARMv5e where all such operations must be
64-bit aligned.

I've added some new tests as well.  In doing so I discovered that the
ldrd/strd peephole tests could never fail since they would match the
source file name in the scanned assembly as well as any instructions
of the intended type.  I've fixed those by tightening the scan results
slightly.

gcc:

* config/arm/arm.c (align_ok_ldrd_strd): New function.
(mem_ok_for_ldrd_strd): New parameter align.  Extract the alignment of the
mem into it.
(gen_operands_ldrd_strd): Validate the alignment of the accesses.

testsuite:

* gcc.target/arm/peep-ldrd-1.c: Tighten test scan pattern.
* gcc.target/arm/peep-strd-1.c: Likewise.
* gcc.target/arm/peep-ldrd-2.c: New test.
* gcc.target/arm/peep-strd-2.c: New test.


Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-strd-2.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/arm/arm.c
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
    branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-strd-1.c
Comment 8 Richard Earnshaw 2017-10-19 13:17:14 UTC
Author: rearnsha
Date: Thu Oct 19 13:16:42 2017
New Revision: 253892

URL: https://gcc.gnu.org/viewcvs?rev=253892&root=gcc&view=rev
Log:
[ARM] PR 82445 - suppress 32-bit aligned ldrd/strd peepholing with -mno-unaligned-access

Peephole patterns exist in the arm backend to spot load/store
operations to adjacent memory operations in order to convert them into
ldrd/strd instructions.  However, when we have strict alignment
enforced, then we can only do this if the accesses are known to be
64-bit aligned; this is unlikely to be the case for most loads.  The
patch adds some alignment checking to the code that validates the
addresses for use in the peephole patterns.  This should also fix
incorrect generation of ldrd/strd with unaligned accesses that could
previously have occurred on ARMv5e where all such operations must be
64-bit aligned.

I've added some new tests as well.  In doing so I discovered that the
ldrd/strd peephole tests could never fail since they would match the
source file name in the scanned assembly as well as any instructions
of the intended type.  I've fixed those by tightening the scan results
slightly.

gcc:

* config/arm/arm.c (align_ok_ldrd_strd): New function.
(mem_ok_for_ldrd_strd): New parameter align.  Extract the alignment of the
mem into it.
(gen_operands_ldrd_strd): Validate the alignment of the accesses.

testsuite:

* gcc.target/arm/peep-ldrd-1.c: Tighten test scan pattern.
* gcc.target/arm/peep-strd-1.c: Likewise.
* gcc.target/arm/peep-ldrd-2.c: New test.
* gcc.target/arm/peep-strd-2.c: New test.


Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-strd-2.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/config/arm/arm.c
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
    branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-strd-1.c
Comment 9 Richard Earnshaw 2017-10-19 13:22:50 UTC
Fixed on trunk gcc-6 & gcc-7