This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)


On August 15, 2019 4:52:24 PM GMT+02:00, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>On 8/15/19 2:54 PM, Richard Biener wrote:
>> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
>> 
>>>>>>
>>>>>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>>>>>> be better to do that more conciously by
>>>>>>
>>>>>>   if (! DECL_USER_ALIGN (decl)
>>>>>>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>>>>>           && targetm.slow_unaligned_access (DECL_MODE (decl),
>align)))
>>>>>>
>>>
>>> ? I don't know why that would be better?
>>> If the value is underaligned no matter why, pretend it was declared
>as
>>> naturally aligned if that causes wrong code otherwise.
>>> That was the idea here.
>> 
>> It would be better because then we ignore it and use what we'd use
>> by default rather than inventing sth new.  And your patch suggests
>> it might be needed to up align even w/o DECL_USER_ALIGN.
>> 
>
>Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
>But it inherits the alignment from the destination variable,
>apparently. 

Yes. I think it shouldn't inherit the alignment unless we are assembling a static initializer. 

>
>did you mean
>if (! DECL_USER_ALIGN (decl)
>    && align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>    && ...
>?
>
>I can give it a try.

No, I meant || thus ignore DECL_USER_ALIGN if it is sth we have to satisfy with unaligned loads. 
>
>>>>>> IMHO whatever code later fails to properly use unaligned loads
>>>>>> should be fixed instead rather than ignoring user requested
>alignment.
>>>>>>
>>>>>> Can you quote a short testcase that explains what exactly goes
>wrong?
>>>>>> The struct-layout ones are awkward to look at...
>>>>>>
>>>>>
>>>>> Sure,
>>>>>
>>>>> $ cat test.c
>>>>> _Complex float __attribute__((aligned(1))) cf;
>>>>>
>>>>> void foo (void)
>>>>> {
>>>>>   cf = 1.0i;
>>>>> }
>>>>>
>>>>> $ arm-linux-gnueabihf-gcc -S test.c 
>>>>> during RTL pass: expand
>>>>> test.c: In function 'foo':
>>>>> test.c:5:6: internal compiler error: in gen_movsf, at
>config/arm/arm.md:7003
>>>>>     5 |   cf = 1.0i;
>>>>>       |   ~~~^~~~~~
>>>>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/config/arm/arm.md:7003
>>>>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>>>> 	../../gcc-trunk/gcc/recog.h:318
>>>>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3695
>>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3490
>>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>>>>> 	../../gcc-trunk/gcc/expr.c:5855
>>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>>> 	../../gcc-trunk/gcc/expr.c:5441
>>>>
>>>> Huh, so why didn't it trigger
>>>>
>>>>   /* Handle misaligned stores.  */
>>>>   mode = TYPE_MODE (TREE_TYPE (to));
>>>>   if ((TREE_CODE (to) == MEM_REF
>>>>        || TREE_CODE (to) == TARGET_MEM_REF)
>>>>       && mode != BLKmode
>>>>       && !mem_ref_refers_to_non_mem_p (to)
>>>>       && ((align = get_object_alignment (to))
>>>>           < GET_MODE_ALIGNMENT (mode))
>>>>       && (((icode = optab_handler (movmisalign_optab, mode))
>>>>            != CODE_FOR_nothing)
>>>>           || targetm.slow_unaligned_access (mode, align)))
>>>>     {
>>>>
>>>> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
>>>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 
>>>> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
>>>>
>>>> Ah, 'to' is a plain DECL here so the above handling is incomplete.
>>>> IIRC component refs like __real cf = 0.f should be handled fine
>>>> again(?).  So, does adding || DECL_P (to) fix the case as well?
>>>>
>>>
>>> So I tried this instead of the varasm.c change:
>>>
>>> Index: expr.c
>>> ===================================================================
>>> --- expr.c	(revision 274487)
>>> +++ expr.c	(working copy)
>>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool
>nontem
>>>    /* Handle misaligned stores.  */
>>>    mode = TYPE_MODE (TREE_TYPE (to));
>>>    if ((TREE_CODE (to) == MEM_REF
>>> -       || TREE_CODE (to) == TARGET_MEM_REF)
>>> +       || TREE_CODE (to) == TARGET_MEM_REF
>>> +       || DECL_P (to))
>>>        && mode != BLKmode
>>> -      && !mem_ref_refers_to_non_mem_p (to)
>>> +      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>>        && ((align = get_object_alignment (to))
>>>  	  < GET_MODE_ALIGNMENT (mode))
>>>        && (((icode = optab_handler (movmisalign_optab, mode))
>>>
>>> Result, yes, it fixes this test case
>>> but then I run all struct-layout-1.exp there are sill cases. where
>we have problems:
>>>
>>> In file included from
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>>>
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:
>In function 'test2112':^M
>>>
>/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10:
>internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
>>>
>/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3:
>note: in definition of macro 'TX'^M
>>>
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1:
>note: in expansion of macro 'TCI'^M
>>>
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294:
>note: in expansion of macro 'F'^M
>>> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/config/arm/arm.md:7107^M
>>> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M
>>>         ../../gcc-trunk/gcc/recog.h:318^M
>>> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3695^M
>>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3791^M
>>> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3490^M
>>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3791^M
>>> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M
>>>         ../../gcc-trunk/gcc/expr.c:5856^M
>>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>>         ../../gcc-trunk/gcc/expr.c:5302^M
>>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>>         ../../gcc-trunk/gcc/expr.c:4983^M
>>> 0x9338af expand_gimple_stmt_1^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:3777^M
>>> 0x9338af expand_gimple_stmt^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:3875^M
>>> 0x939221 expand_gimple_basic_block^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:5915^M
>>> 0x93af86 execute^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:6538^M
>>> Please submit a full bug report,^M
>>>
>>> My personal gut feeling this will be more fragile than over-aligning
>the
>>> constants.
>> 
>> As said the constant shouldn't end up under-aligned, the user cannot
>> specify alignment of literal constants.  Not sure what you mean
>> with "over"-aligning.
>> 
>
>
>Hmm wait a moment, I actually wanted _only_ to change the
>DECL_ARTIFICIAL
>that is built by build_constant_desc.  It uses align_variable of
>course,
>but I totally missed that this also controls the alignment of normal
>variables, sorry about the confusion here.
>
>I mean we should align the constant for the unaligned complex with
>the natural alignment of the type-mode. 

Agreed. 

 That wrong fix made
>the variables ignore the alignment, which was of course not intended,
>and instead I would need:
>
>Index: expr.c
>===================================================================
>--- expr.c	(revision 274531)
>+++ expr.c	(working copy)
>@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool
>nontem
>   /* Handle misaligned stores.  */
>   mode = TYPE_MODE (TREE_TYPE (to));
>   if ((TREE_CODE (to) == MEM_REF
>-       || TREE_CODE (to) == TARGET_MEM_REF)
>+       || TREE_CODE (to) == TARGET_MEM_REF
>+       || DECL_P (to))
>       && mode != BLKmode
>-      && !mem_ref_refers_to_non_mem_p (to)
>+      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>       && ((align = get_object_alignment (to))
> 	  < GET_MODE_ALIGNMENT (mode))
>       && (((icode = optab_handler (movmisalign_optab, mode))
>
>Index: varasm.c
>===================================================================
>--- varasm.c	(revision 274531)
>+++ varasm.c	(working copy)
>@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "stmt.h"
> #include "expr.h"
> #include "expmed.h"
>+#include "optabs.h"
> #include "output.h"
> #include "langhooks.h"
> #include "debug.h"
>@@ -3386,7 +3387,15 @@ build_constant_desc (tree exp)
>   if (TREE_CODE (exp) == STRING_CST)
>SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN
>(decl)));
>   else
>-    align_variable (decl, 0);
>+    {
>+      align_variable (decl, 0);
>+      if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>+	  && ((optab_handler (movmisalign_optab, DECL_MODE (decl))
>+		!= CODE_FOR_nothing)
>+	      || targetm.slow_unaligned_access (DECL_MODE (decl),
>+						DECL_ALIGN (decl))))
>+	SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl)));
>+    }
> 
>   /* Now construct the SYMBOL_REF and the MEM.  */
>   if (use_object_blocks_p ())
>
>>>
>>>
>>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>>> 	../../gcc-trunk/gcc/expr.c:4983
>>>>> 0x93396f expand_gimple_stmt_1
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3777
>>>>> 0x93396f expand_gimple_stmt
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3875
>>>>> 0x9392e1 expand_gimple_basic_block
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:5915
>>>>> 0x93b046 execute
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:6538
>>>>> Please submit a full bug report,
>>>>> with preprocessed source if appropriate.
>>>>> Please include the complete backtrace with any bug report.
>>>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>>>
>>>>> Without the hunk in varasm.c of course.
>>>>>
>>>>> What happens is that expand_expr_real_2 returns a unaligned
>mem_ref here:
>>>>>
>>>>>     case COMPLEX_CST:
>>>>>       /* Handle evaluating a complex constant in a CONCAT target. 
>*/
>>>>>       if (original_target && GET_CODE (original_target) == CONCAT)
>>>>>         {
>>>>>           [... this path not taken ...]
>>>
>>> BTW: this code block executes when the other ICE happens.
>>>  
>>>>>         }
>>>>>
>>>>>       /* fall through */
>>>>>
>>>>>     case STRING_CST:
>>>>>       temp = expand_expr_constant (exp, 1, modifier);
>>>>>
>>>>>       /* temp contains a constant address.
>>>>>          On RISC machines where a constant address isn't valid,
>>>>>          make some insns to get that address into a register.  */
>>>>>       if (modifier != EXPAND_CONST_ADDRESS
>>>>>           && modifier != EXPAND_INITIALIZER
>>>>>           && modifier != EXPAND_SUM
>>>>>           && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
>>>>>                                             MEM_ADDR_SPACE
>(temp)))
>>>>>         return replace_equiv_address (temp,
>>>>>                                       copy_rtx (XEXP (temp, 0)));
>>>>>       return temp;
>>>>>
>>>>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be
>usable
>>>>> by emit_move_insn, that is expected just *everywhere* and can't be
>changed.
>>>>>
>>>>> This could probably be fixed in an ugly way in the COMPLEX_CST,
>handler
>>>>> but OTOH, I don't see any reason why this constant has to be
>misaligned
>>>>> when it can be easily aligned, which avoids the need for a
>misaligned access.
>>>>
>>>> If the COMPLEX_CST happends to end up in unaligned memory then
>that's
>>>> of course a bug (unless the target requests that for all
>COMPLEX_CSTs).
>>>> That is, if the unalignment is triggered because the store is to an
>>>> unaligned decl.
>>>>
>>>> But I think the issue is the above one?
>>>>
>>>
>>> yes initially the constant seems to be unaligned. then it is
>expanded,
>>> and there is no special handling for unaligned constants in
>expand_expr_real,
>>> and then probably expand_assignment or store_expr seem not fully
>prepared for
>>> this either.
>> 
>> With a cross I see the constant has regular aligned _Complex type
>> so not sure how it can end up unaligned.
>> 
>
>Maybe a target configuration issue.
>Not sure, I have configured mine this way:
>
>../gcc-trunk/configure
>--prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64
>--target=arm-linux-gnueabihf --enable-languages=all --with-arch=armv7-a
>--with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard
>
>However it appears now there are two different errors, one is in
>expand_assignment
>which you found (I start to wonder if I should add you to the authors
>section
>of this patch), and a different one, which I have not yet simplified,
>but you can easily try that for yourself:
>
>make check-gcc-c RUNTESTFLAGS="struct-layout-1.exp=*"
>
>it is okay when the test fails to execute but there should no internal
>compiler errors.
>
>
>>>>>
>>>>> The problem is that the code that handles this misaligned access
>>>>> is skipped because the mem_rtx has initially no MEM_ATTRS and
>therefore
>>>>> MEM_ALIGN == 32, and therefore the code that handles the unaligned
>>>>> access is not taken.  BUT before the mem_rtx is returned it is
>>>>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an
>assertion,
>>>>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought
>to be
>>>>> usable with emit_move_insn.
>>>>
>>>> yes, as said the _access_ determines the address should be aligned
>>>> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according
>>>> to the access type/mode.  But we can't trust DECL_ALIGN of
>>>> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn
>>>> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just
>>>> we may not compute nonzero bits for the actual address because
>>>> of function pointer mangling)
>>>> (for accessing function code I'd say this would be premature
>>>> optimization, but ...)
>>>>
>>>
>>> Not a very nice solution, but it is not worth to spend much effort
>>> in optimizing undefined behavior, I just want to avoid the ICE
>>> at this time and would not trust the DECL_ALIGN either.
>> 
>> So I meant
>> 
>> Index: gcc/builtins.c
>> ===================================================================
>> --- gcc/builtins.c      (revision 274534)
>> +++ gcc/builtins.c      (working copy)
>> @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign
>>  
>>    /* Extract alignment information from the innermost object and
>>       possibly adjust bitpos and offset.  */
>> -  if (TREE_CODE (exp) == FUNCTION_DECL)
>> +  if (TREE_CODE (exp) == FUNCTION_DECL
>> +      && addr_p)
>>      {
>>        /* Function addresses can encode extra information besides
>their
>>          alignment.  However, if TARGET_PTRMEMFUNC_VBIT_LOCATION
>> 
>> so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we
>> can trust it).
>> 
>>>>
>>>> Still I think you can't simply override STACK_SLOT_ALIGNMENT just
>because
>>>> of the mode of an entry param, can you?  If you can assume a bigger
>>>> alignment then STACK_SLOT_ALIGNMENT should return it.
>>>>
>>>
>>> I don't see a real problem here.  All target except i386 and gcn
>(whatever that is)
>>> use the default for STACK_SLOT_ALIGNMENT which simply allows any
>(large) align value
>>> to rule the effective STACK_SLOT_ALIGNMENT.  The user could have
>simply declared
>>> the local variable with the alignment that results in better code
>FWIW.
>>>
>>> If the stack alignment is too high that is capped in
>assign_stack_local:
>>>
>>>   /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT. 
>*/
>>>   if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
>>>     {
>>>       alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
>>>       alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT;
>>>     }
>>>
>>> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT
>should
>>> be sufficient for all modes that need movmisalign_optab and friends.
>>> If it is not, an ICE would be just fine.
>> 
>> Hmm.  In some way we could better communicate with the user then
>> and do not allow under-aligning automatic vars?  But the you
>> still have packed structs with BLKmode where the actual field
>> accesses will carry SImode even when not aligned(?)
>> 
>
>Yes, that works also when unaligned.
>
>> 
>> Please split it into the parts for the PR and parts making the
>> asserts not trigger.
>> 
>
>Yes, will do.
>
>> The PR is already fixed, right?  The assign_parm_find_stack_rtl hunk
>> is merely an optimization?
>> 
>
>Hmmmm...  You are right, I should have added that to the commit
>message...
>
>Of course the test cases try to verify the optimization.
>
>
>Thanks
>Bernd.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]