Accessing unaligned memory positions used to be forbidden on ARM cpus. But since ARMv6 (quite many years by now), this operation is supported. However, GCC 4.5 - 4.6 - 4.7 - 4.8 seem to generate sub-optimal code on these targets. In theory, it's illegal to issue a direct statement such as : u32 read32(const void* ptr) { return *(const u32*)ptr; } if ptr is not properly aligned. There are 2 work-around that I know. The first is to use `packed` instruction, which is not portable (compiler specific). The second and better one is to use memcpy() : u32 read32(const void* ptr) { u32 v; memcpy(&u, ptr, sizeof(v)); return v; } This version is portable and safe. It also works very well on multiple platform, such as x86/x64 or PPC, or ARM64, being reduced to an optimal assembly sequence (single instruction). Unfortunately, GCC 4.5 - 4.6 - 4.7 - 4.8 generate suboptimal assembly for this function on ARMv6 or ARMv7 : read32(void const*): ldr r0, [r0] @ unaligned sub sp, sp, #8 str r0, [sp, #4] @ unaligned ldr r0, [sp, #4] add sp, sp, #8 bx lr This in stark contrast with clang, which generates a much more efficient assembly : read32(void const*): @ @read32(void const*) ldr r0, [r0] bx lr (assembly can be generated and displayed using a simple tool : https://goo.gl/7FWDB8) It's not that gcc is unaware of cpu's unaligned memory access capability, since it does use it : `ldr r0, [r0]` but then lose a lot of time on useless operations on a discardable temporary variable, storing data into stack just to read it again. Inlining does not save the day. -O3 help at reducing the impact, but it's still large. On a recent exercise comparing efficient vs inefficient memory access on ARMv6 and ARMv7, the measured difference was very large : up to 6x faster at -O2 settings. See : http://fastcompression.blogspot.com/2015/08/accessing-unaligned-memory.html It's definitely a too large difference to be ignored. As a consequence, to preserve performance, source code must try a bunch of possibilities depending on target and compiler, if not version. In some circumstances (gcc with ARMv6, or gcc <= 4.5), it's even necessary to write illegal code (see !st version above) to reach optimal performance on targets. This looks like a waste of energy, and a recipe for bugs, especially compared to clang, which generates clean code in all circumstances for all targets. Considering the huge performance difference such an improvement could make, is that something the gcc team would like to look into ? Regards
I think this boils down to the fact that memcpy expansion is done too late and that (with more recent GCC) the "inlining" done on the GIMPLE level is restricted to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 unconditionally. I guess arm goes through the movmisalign optab for unaligned accesses and the GIMPLE inlining could be enabled as well if movmisalign is supported. Note that GCC 4.8 is no longer supported and enhancements will go to GCC 6 at most. With current trunk and -O2 -march=armv6 I get for typedef unsigned int u32; u32 read32(const void* ptr) { u32 v; __builtin_memcpy(&v, ptr, sizeof(v)); return v; } read32: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r0, [r0] @ unaligned sub sp, sp, #8 add sp, sp, #8 @ sp needed bx lr so apart from the stack slot not getting removed this has already improved, but the issue I mention still happens as we expand from read32 (const void * ptr) { u32 v; u32 _4; <bb 2>: __builtin_memcpy (&v, ptr_2(D), 4); _4 = v; v ={v} {CLOBBER}; return _4; so partially confirmed, for the GIMPLE memory op "inlining" issue.
(In reply to Richard Biener from comment #1) > I think this boils down to the fact that memcpy expansion is done too late > and > that (with more recent GCC) the "inlining" done on the GIMPLE level is > restricted > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > unconditionally. > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions work with misaligned addresses (ldm, for example). The only way to handle misaligned copies is through the movmisalign API.
On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > --- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #1) > > I think this boils down to the fact that memcpy expansion is done too late > > and > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > restricted > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > unconditionally. > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions > work with misaligned addresses (ldm, for example). The only way to handle > misaligned copies is through the movmisalign API. Are the movmisalign handled ones reasonably efficient? That is, more efficient than memcpy/memmove? Then we should experiment with Index: gcc/gimple-fold.c =================================================================== --- gcc/gimple-fold.c (revision 227252) +++ gcc/gimple-fold.c (working copy) @@ -708,7 +708,9 @@ gimple_fold_builtin_memory_op (gimple_st /* If the destination pointer is not aligned we must be able to emit an unaligned store. */ && (dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type)) - || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), dest_align))) + || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), dest_align) + || (optab_handler (movmisalign_optab, TYPE_MODE (type)) + != CODE_FOR_nothing))) { tree srctype = type; tree desttype = type; @@ -720,7 +722,10 @@ gimple_fold_builtin_memory_op (gimple_st srcmem = tem; else if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)) && SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), - src_align)) + src_align) + && (optab_handler (movmisalign_optab, + TYPE_MODE (type)) + == CODE_FOR_nothing)) srcmem = NULL_TREE; if (srcmem) {
(In reply to Richard Earnshaw from comment #2) > (In reply to Richard Biener from comment #1) > > I think this boils down to the fact that memcpy expansion is done too late > > and > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > restricted > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > unconditionally. > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load > instructions work with misaligned addresses (ldm, for example). The only > way to handle misaligned copies is through the movmisalign API. Hmmm, actually that's not completely true - for SImode and HImode objects there don't appear to be movmisalign interfaces. For things like memcpy this is done through the memcpy interface where we end up generating copies using unaligned_loadsi and unaliged_storesi. We may need to experiment a bit with the movmisalign interface too. Ramana
(In reply to rguenther@suse.de from comment #3) > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > --- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > > (In reply to Richard Biener from comment #1) > > > I think this boils down to the fact that memcpy expansion is done too late > > > and > > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > > restricted > > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > > unconditionally. > > > > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions > > work with misaligned addresses (ldm, for example). The only way to handle > > misaligned copies is through the movmisalign API. > > Are the movmisalign handled ones reasonably efficient? That is, more > efficient than memcpy/memmove? Then we should experiment with What do you mean by more efficient here ? they'll end up calling down to the same unspec block if we implemented them - memcpy / memmove go through the backend block_move interface. Will try the patch with a hacked up movmisalign pattern in the backend. > > Index: gcc/gimple-fold.c > =================================================================== > --- gcc/gimple-fold.c (revision 227252) > +++ gcc/gimple-fold.c (working copy) > @@ -708,7 +708,9 @@ gimple_fold_builtin_memory_op (gimple_st > /* If the destination pointer is not aligned we must be > able > to emit an unaligned store. */ > && (dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type)) > - || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > dest_align))) > + || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > dest_align) > + || (optab_handler (movmisalign_optab, TYPE_MODE > (type)) > + != CODE_FOR_nothing))) > { > tree srctype = type; > tree desttype = type; > @@ -720,7 +722,10 @@ gimple_fold_builtin_memory_op (gimple_st > srcmem = tem; > else if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE > (type)) > && SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > - src_align)) > + src_align) > + && (optab_handler (movmisalign_optab, > + TYPE_MODE (type)) > + == CODE_FOR_nothing)) > srcmem = NULL_TREE; > if (srcmem) > {
(In reply to rguenther@suse.de from comment #3) > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > --- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > > (In reply to Richard Biener from comment #1) > > > I think this boils down to the fact that memcpy expansion is done too late > > > and > > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > > restricted > > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > > unconditionally. > > > > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions > > work with misaligned addresses (ldm, for example). The only way to handle > > misaligned copies is through the movmisalign API. > > Are the movmisalign handled ones reasonably efficient? That is, more > efficient than memcpy/memmove? Then we should experiment with minor nit - missing include of optabs.h - fixing that and adding a movmisalignsi pattern in the backend that just generates either an unaligned / storesi insn generates the following for me for the above mentioned testcase. read32: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r0, [r0] @ unaligned bx lr I'm on holiday from this evening so don't really want to push something today ... > > Index: gcc/gimple-fold.c > =================================================================== > --- gcc/gimple-fold.c (revision 227252) > +++ gcc/gimple-fold.c (working copy) > @@ -708,7 +708,9 @@ gimple_fold_builtin_memory_op (gimple_st > /* If the destination pointer is not aligned we must be > able > to emit an unaligned store. */ > && (dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type)) > - || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > dest_align))) > + || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > dest_align) > + || (optab_handler (movmisalign_optab, TYPE_MODE > (type)) > + != CODE_FOR_nothing))) > { > tree srctype = type; > tree desttype = type; > @@ -720,7 +722,10 @@ gimple_fold_builtin_memory_op (gimple_st > srcmem = tem; > else if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE > (type)) > && SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > - src_align)) > + src_align) > + && (optab_handler (movmisalign_optab, > + TYPE_MODE (type)) > + == CODE_FOR_nothing)) > srcmem = NULL_TREE; > if (srcmem) > {
On Thu, 27 Aug 2015, ramana at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > --- Comment #6 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> --- > (In reply to rguenther@suse.de from comment #3) > > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > > > --- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > > > (In reply to Richard Biener from comment #1) > > > > I think this boils down to the fact that memcpy expansion is done too late > > > > and > > > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > > > restricted > > > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > > > unconditionally. > > > > > > > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions > > > work with misaligned addresses (ldm, for example). The only way to handle > > > misaligned copies is through the movmisalign API. > > > > Are the movmisalign handled ones reasonably efficient? That is, more > > efficient than memcpy/memmove? Then we should experiment with > > minor nit - missing include of optabs.h - fixing that and adding a > movmisalignsi pattern in the backend that just generates either an unaligned / > storesi insn generates the following for me for the above mentioned testcase. > > > read32: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > ldr r0, [r0] @ unaligned > bx lr > > > > > I'm on holiday from this evening so don't really want to push something today > ... Sure ;) When adding the GIMPLE folding I was just careful here as I don't really have a STRICT_ALIGNMENT machine with movmisalign handling available. Thus full testing is appreciated (might turn up some testcases that need adjustment). There are more STRICT_ALIGN guarded cases below in the function, eventually they can be modified as well (at which point splitting out the alignment check to a separate function makes sense).
(In reply to rguenther@suse.de from comment #7) > On Thu, 27 Aug 2015, ramana at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > --- Comment #6 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> --- > > (In reply to rguenther@suse.de from comment #3) > > > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > > > > > --- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > > > > (In reply to Richard Biener from comment #1) > > > > > I think this boils down to the fact that memcpy expansion is done too late > > > > > and > > > > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > > > > restricted > > > > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > > > > unconditionally. > > > > > > > > > > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions > > > > work with misaligned addresses (ldm, for example). The only way to handle > > > > misaligned copies is through the movmisalign API. > > > > > > Are the movmisalign handled ones reasonably efficient? That is, more > > > efficient than memcpy/memmove? Then we should experiment with > > > > minor nit - missing include of optabs.h - fixing that and adding a > > movmisalignsi pattern in the backend that just generates either an unaligned / > > storesi insn generates the following for me for the above mentioned testcase. > > > > > > read32: > > @ args = 0, pretend = 0, frame = 0 > > @ frame_needed = 0, uses_anonymous_args = 0 > > @ link register save eliminated. > > ldr r0, [r0] @ unaligned > > bx lr > > > > > > > > > > I'm on holiday from this evening so don't really want to push something today > > ... > > Sure ;) When adding the GIMPLE folding I was just careful here as I > don't really have a STRICT_ALIGNMENT machine with movmisalign handling > available. Thus full testing is appreciated (might turn up some > testcases that need adjustment). There are more STRICT_ALIGN > guarded cases below in the function, eventually they can be modified > as well (at which point splitting out the alignment check to a separate > function makes sense). This was the backend hack I was playing with. It needs extension to HImode values, cleaning up and regression testing and some small amount of benchmarking to see we're doing the right thing. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 288bbb9..eaff494 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -11423,6 +11423,27 @@ }" ) +(define_expand "movmisalignsi" + [(match_operand:SI 0 "general_operand") + (match_operand:SI 1 "general_operand")] + "unaligned_access" +{ + /* This pattern is not permitted to fail during expansion: if both arguments + are non-registers (e.g. memory := constant, which can be created by the + auto-vectorizer), force operand 1 into a register. */ + if (!s_register_operand (operands[0], SImode) + && !s_register_operand (operands[1], SImode)) + operands[1] = force_reg (SImode, operands[1]); + + if (MEM_P (operands[1])) + emit_insn (gen_unaligned_loadsi (operands[0], operands[1])); + else + emit_insn (gen_unaligned_storesi (operands[0], operands[1])); + + DONE; +}) + + ;; Vector bits common to IWMMXT and Neon (include "vec-common.md") ;; Load the Intel Wireless Multimedia Extension patterns
Does that really do the right thing? That is, does force_reg understand a misaligned memory op? Also, what if one memory operand is aligned, but the other not? Don't we want to have the right combination of aligned/misaligned instructions?
On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > --- Comment #9 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > Does that really do the right thing? That is, does force_reg understand a > misaligned memory op? > > Also, what if one memory operand is aligned, but the other not? Don't we want > to have the right combination of aligned/misaligned instructions? I think you never get two MEMs, you at most get a constant on the RHS of a store.
(In reply to rguenther@suse.de from comment #10) > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > --- Comment #9 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > > Does that really do the right thing? That is, does force_reg understand a > > misaligned memory op? In practice I think yes. > > > > Also, what if one memory operand is aligned, but the other not? Don't we want > > to have the right combination of aligned/misaligned instructions? > > I think you never get two MEMs, you at most get a constant on the > RHS of a store. Yep that's my understanding too. movmisalign<mode> has stricter rules compared to mov<mode> so we have to handle some of these carefully. The logic in here is essentially from neon.md : movmisalign<mode>, so some of it may not be relevant here in the scalar case, but it's probably better to be a bit defensive here. I tried playing with the HImode case but the pain I had was with the fact that HImode movmisalign expects a load into a HImode register and I hadn't got my head around that given other things I needed to finish up before leaving.
(In reply to rguenther@suse.de from comment #3) > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > --- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- > > (In reply to Richard Biener from comment #1) > > > I think this boils down to the fact that memcpy expansion is done too late > > > and > > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > > restricted > > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > > unconditionally. > > > > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions > > work with misaligned addresses (ldm, for example). The only way to handle > > misaligned copies is through the movmisalign API. > > Are the movmisalign handled ones reasonably efficient? That is, more > efficient than memcpy/memmove? Then we should experiment with > > Index: gcc/gimple-fold.c > =================================================================== > --- gcc/gimple-fold.c (revision 227252) > +++ gcc/gimple-fold.c (working copy) > @@ -708,7 +708,9 @@ gimple_fold_builtin_memory_op (gimple_st > /* If the destination pointer is not aligned we must be > able > to emit an unaligned store. */ > && (dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type)) > - || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > dest_align))) > + || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > dest_align) > + || (optab_handler (movmisalign_optab, TYPE_MODE > (type)) > + != CODE_FOR_nothing))) > { > tree srctype = type; > tree desttype = type; > @@ -720,7 +722,10 @@ gimple_fold_builtin_memory_op (gimple_st > srcmem = tem; > else if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE > (type)) > && SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), > - src_align)) > + src_align) > + && (optab_handler (movmisalign_optab, > + TYPE_MODE (type)) > + == CODE_FOR_nothing)) > srcmem = NULL_TREE; > if (srcmem) > { This plus the backend changes to deal with unaligned himode and simode values tested ok on armhf with only 2 extra failures in strlen-opt-8.c. Prima-facie they appear to be testisms, but it will be fun to handle this across all architecture levels for the arm target as unaligned access depends on architecture levels.
Author: ramana Date: Fri Oct 9 10:53:31 2015 New Revision: 228643 URL: https://gcc.gnu.org/viewcvs?rev=228643&root=gcc&view=rev Log: [Patch PR target/67366 1/2] [ARM] - Add movmisalignhi / si patterns This adds movmisalignhi and movmisalignsi expanders when unaligned access is allowed by the architecture. This allows the mid-end to expand to misaligned loads and stored. Compared code generated for the Linux kernel and it changes code generation for a handful of files all for the better basically by reducing the stack usage. Tested by : 1. armhf bootstrap and regression test - no regressions. 2.. arm-none-eabi cross build and regression test for {-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp} {-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard} {-marm/-mcpu=arm7tdmi/-mfloat-abi=soft} {-mthumb/-mcpu=arm7tdmi/-mfloat-abi=soft} Will apply to trunk once 2/2 is approved. regards Ramana 2015-10-09 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com> PR target/67366 * config/arm/arm.md (movmisalign<mode>): New. * config/arm/iterators.md (HSI): New. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.md trunk/gcc/config/arm/iterators.md
Fixed on trunk - Yann please try and let us know.
Author: ramana Revision: 228644 Modified property: svn:log Modified: svn:log at Fri Oct 9 11:08:05 2015 ------------------------------------------------------------------------------ --- svn:log (original) +++ svn:log Fri Oct 9 11:08:05 2015 @@ -1,45 +1,43 @@ -[AArch64] Handle literal pools for functions > 1 MiB in size. - +[PATCH PR target/67366 2/2] [gimple-fold.c] Support movmisalign optabs in gimple-fold.c -This patch fixes the issue in PR63304 where we have -functions that are > 1MiB. The idea is to use adrp / ldr or adrp / add -instructions to address the literal pools under the use of a command line -option. I would like to turn this on by default on trunk but keep this -disabled by default for the release branches in order to get some -serious testing for this feature while it bakes on trunk. +This patch by Richard allows for movmisalign optabs to be supported +in gimple-fold.c. This caused a bit of pain in the testsuite with strlenopt-8.c +in conjunction with the ARM support for movmisalign_optabs as the test +was coded up to do different things depending on whether the target +supported misaligned access or not. However now with unaligned access +being allowed for different levels of the architecture in the arm backend, +the concept of the helper function non_strict_align mapping identically +to the definition of STRICT_ALIGNMENT disappears. -As a follow-up I would like to try and see if estimate_num_insns or -something else can give us a heuristic to turn this on for "large" functions. -After all the number of incidences of this are quite low in real life, -so may be we should look to restrict this use as much as possible on the -grounds that this code generation implies an extra integer register for -addressing for every floating point and vector constant and I don't think -that's great in code that already may have high register pressure. +Adjusted thusly for ARM. The testsuite/lib changes were tested with an +arm-none-eabi multilib that included architecture variants that did not +support unaligned access and architecture variants that did. -Tested on aarch64-none-elf with no regressions. A previous -version was bootstrapped and regression tested. +The testing matrix for this patch was: -Applied to trunk. +1. x86_64 bootstrap and regression test - no regressions. +2. armhf bootstrap and regression test - no regressions. +3. arm-none-eabi cross build and regression test for -regards -Ramana +{-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp} +{-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard} +{-marm/-mcpu=arm7tdmi/-mfloat-abi=soft} +{-mthumb/-mcpu=arm7tdmi/-mfloat-abi=soft} -2015-09-14 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com> +with no regressions. - PR target/63304 - * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Handle - nopcrelative_literal_loads. - (aarch64_classify_address): Likewise. - (aarch64_constant_pool_reload_icode): Define. - (aarch64_secondary_reload): Handle secondary reloads for - literal pools. - (aarch64_override_options): Handle nopcrelative_literal_loads. - (aarch64_classify_symbol): Handle nopcrelative_literal_loads. - * config/aarch64/aarch64.md (aarch64_reload_movcp<GPF_TF:mode><P:mode>): - Define. - (aarch64_reload_movcp<VALL:mode><P:mode>): Likewise. - * config/aarch64/aarch64.opt (mpc-relative-literal-loads): New option. - * config/aarch64/predicates.md (aarch64_constant_pool_symref): New - predicate. - * doc/invoke.texi (mpc-relative-literal-loads): Document. +Ok to apply ? +2015-10-09 Richard Biener <rguenth@suse.de> + + PR target/67366 + * gimple-fold.c (optabs-query.h): Include + (gimple_fold_builtin_memory_op): Allow unaligned stores + when movmisalign_optabs are available. + +2015-10-09 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com> + + PR target/67366 + * lib/target-supports.exp (check_effective_target_non_strict_align): + Adjust for arm*-*-*. + * gcc.target/arm/pr67366.c: New test.
Could this fix also possibly improve: Bug 67507 - "Code size increase with -Os from GCC 4.8.x to GCC 4.9.x for ARM thumb1", which also seems related to alignment on ARM targets? Thanks, Fredrik
(In reply to Fredrik Hederstierna from comment #16) > Could this fix also possibly improve: > Bug 67507 - "Code size increase with -Os from GCC 4.8.x to GCC 4.9.x for ARM > thumb1", which also seems related to alignment on ARM targets? > Thanks, Fredrik No I don't think so ...