This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)


On 8/15/19 2:54 PM, Richard Biener wrote:
> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
> 
>>>>>
>>>>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>>>>> be better to do that more conciously by
>>>>>
>>>>>   if (! DECL_USER_ALIGN (decl)
>>>>>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>>>>           && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>>>>>
>>
>> ? I don't know why that would be better?
>> If the value is underaligned no matter why, pretend it was declared as
>> naturally aligned if that causes wrong code otherwise.
>> That was the idea here.
> 
> It would be better because then we ignore it and use what we'd use
> by default rather than inventing sth new.  And your patch suggests
> it might be needed to up align even w/o DECL_USER_ALIGN.
> 

Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
But it inherits the alignment from the destination variable, apparently.

did you mean
if (! DECL_USER_ALIGN (decl)
    && align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
    && ...
?

I can give it a try.

>>>>> IMHO whatever code later fails to properly use unaligned loads
>>>>> should be fixed instead rather than ignoring user requested alignment.
>>>>>
>>>>> Can you quote a short testcase that explains what exactly goes wrong?
>>>>> The struct-layout ones are awkward to look at...
>>>>>
>>>>
>>>> Sure,
>>>>
>>>> $ cat test.c
>>>> _Complex float __attribute__((aligned(1))) cf;
>>>>
>>>> void foo (void)
>>>> {
>>>>   cf = 1.0i;
>>>> }
>>>>
>>>> $ arm-linux-gnueabihf-gcc -S test.c 
>>>> during RTL pass: expand
>>>> test.c: In function 'foo':
>>>> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
>>>>     5 |   cf = 1.0i;
>>>>       |   ~~~^~~~~~
>>>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/config/arm/arm.md:7003
>>>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>>> 	../../gcc-trunk/gcc/recog.h:318
>>>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3695
>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3490
>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>>>> 	../../gcc-trunk/gcc/expr.c:5855
>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>> 	../../gcc-trunk/gcc/expr.c:5441
>>>
>>> Huh, so why didn't it trigger
>>>
>>>   /* Handle misaligned stores.  */
>>>   mode = TYPE_MODE (TREE_TYPE (to));
>>>   if ((TREE_CODE (to) == MEM_REF
>>>        || TREE_CODE (to) == TARGET_MEM_REF)
>>>       && mode != BLKmode
>>>       && !mem_ref_refers_to_non_mem_p (to)
>>>       && ((align = get_object_alignment (to))
>>>           < GET_MODE_ALIGNMENT (mode))
>>>       && (((icode = optab_handler (movmisalign_optab, mode))
>>>            != CODE_FOR_nothing)
>>>           || targetm.slow_unaligned_access (mode, align)))
>>>     {
>>>
>>> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
>>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 
>>> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
>>>
>>> Ah, 'to' is a plain DECL here so the above handling is incomplete.
>>> IIRC component refs like __real cf = 0.f should be handled fine
>>> again(?).  So, does adding || DECL_P (to) fix the case as well?
>>>
>>
>> So I tried this instead of the varasm.c change:
>>
>> Index: expr.c
>> ===================================================================
>> --- expr.c	(revision 274487)
>> +++ expr.c	(working copy)
>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
>>    /* Handle misaligned stores.  */
>>    mode = TYPE_MODE (TREE_TYPE (to));
>>    if ((TREE_CODE (to) == MEM_REF
>> -       || TREE_CODE (to) == TARGET_MEM_REF)
>> +       || TREE_CODE (to) == TARGET_MEM_REF
>> +       || DECL_P (to))
>>        && mode != BLKmode
>> -      && !mem_ref_refers_to_non_mem_p (to)
>> +      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>        && ((align = get_object_alignment (to))
>>  	  < GET_MODE_ALIGNMENT (mode))
>>        && (((icode = optab_handler (movmisalign_optab, mode))
>>
>> Result, yes, it fixes this test case
>> but then I run all struct-layout-1.exp there are sill cases. where we have problems:
>>
>> In file included from /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: In function 'test2112':^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: note: in definition of macro 'TX'^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: note: in expansion of macro 'TCI'^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: note: in expansion of macro 'F'^M
>> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/config/arm/arm.md:7107^M
>> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M
>>         ../../gcc-trunk/gcc/recog.h:318^M
>> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3695^M
>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3791^M
>> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3490^M
>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3791^M
>> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M
>>         ../../gcc-trunk/gcc/expr.c:5856^M
>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>         ../../gcc-trunk/gcc/expr.c:5302^M
>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>         ../../gcc-trunk/gcc/expr.c:4983^M
>> 0x9338af expand_gimple_stmt_1^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:3777^M
>> 0x9338af expand_gimple_stmt^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:3875^M
>> 0x939221 expand_gimple_basic_block^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:5915^M
>> 0x93af86 execute^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:6538^M
>> Please submit a full bug report,^M
>>
>> My personal gut feeling this will be more fragile than over-aligning the
>> constants.
> 
> As said the constant shouldn't end up under-aligned, the user cannot
> specify alignment of literal constants.  Not sure what you mean
> with "over"-aligning.
> 


Hmm wait a moment, I actually wanted _only_ to change the DECL_ARTIFICIAL
that is built by build_constant_desc.  It uses align_variable of course,
but I totally missed that this also controls the alignment of normal
variables, sorry about the confusion here.

I mean we should align the constant for the unaligned complex with
the natural alignment of the type-mode.  That wrong fix made
the variables ignore the alignment, which was of course not intended,
and instead I would need:

Index: expr.c
===================================================================
--- expr.c	(revision 274531)
+++ expr.c	(working copy)
@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
-       || TREE_CODE (to) == TARGET_MEM_REF)
+       || TREE_CODE (to) == TARGET_MEM_REF
+       || DECL_P (to))
       && mode != BLKmode
-      && !mem_ref_refers_to_non_mem_p (to)
+      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
       && ((align = get_object_alignment (to))
 	  < GET_MODE_ALIGNMENT (mode))
       && (((icode = optab_handler (movmisalign_optab, mode))

Index: varasm.c
===================================================================
--- varasm.c	(revision 274531)
+++ varasm.c	(working copy)
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stmt.h"
 #include "expr.h"
 #include "expmed.h"
+#include "optabs.h"
 #include "output.h"
 #include "langhooks.h"
 #include "debug.h"
@@ -3386,7 +3387,15 @@ build_constant_desc (tree exp)
   if (TREE_CODE (exp) == STRING_CST)
     SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN (decl)));
   else
-    align_variable (decl, 0);
+    {
+      align_variable (decl, 0);
+      if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl))
+	  && ((optab_handler (movmisalign_optab, DECL_MODE (decl))
+		!= CODE_FOR_nothing)
+	      || targetm.slow_unaligned_access (DECL_MODE (decl),
+						DECL_ALIGN (decl))))
+	SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl)));
+    }
 
   /* Now construct the SYMBOL_REF and the MEM.  */
   if (use_object_blocks_p ())

>>
>>
>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>> 	../../gcc-trunk/gcc/expr.c:4983
>>>> 0x93396f expand_gimple_stmt_1
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3777
>>>> 0x93396f expand_gimple_stmt
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3875
>>>> 0x9392e1 expand_gimple_basic_block
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:5915
>>>> 0x93b046 execute
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:6538
>>>> Please submit a full bug report,
>>>> with preprocessed source if appropriate.
>>>> Please include the complete backtrace with any bug report.
>>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>>
>>>> Without the hunk in varasm.c of course.
>>>>
>>>> What happens is that expand_expr_real_2 returns a unaligned mem_ref here:
>>>>
>>>>     case COMPLEX_CST:
>>>>       /* Handle evaluating a complex constant in a CONCAT target.  */
>>>>       if (original_target && GET_CODE (original_target) == CONCAT)
>>>>         {
>>>>           [... this path not taken ...]
>>
>> BTW: this code block executes when the other ICE happens.
>>  
>>>>         }
>>>>
>>>>       /* fall through */
>>>>
>>>>     case STRING_CST:
>>>>       temp = expand_expr_constant (exp, 1, modifier);
>>>>
>>>>       /* temp contains a constant address.
>>>>          On RISC machines where a constant address isn't valid,
>>>>          make some insns to get that address into a register.  */
>>>>       if (modifier != EXPAND_CONST_ADDRESS
>>>>           && modifier != EXPAND_INITIALIZER
>>>>           && modifier != EXPAND_SUM
>>>>           && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
>>>>                                             MEM_ADDR_SPACE (temp)))
>>>>         return replace_equiv_address (temp,
>>>>                                       copy_rtx (XEXP (temp, 0)));
>>>>       return temp;
>>>>
>>>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable
>>>> by emit_move_insn, that is expected just *everywhere* and can't be changed.
>>>>
>>>> This could probably be fixed in an ugly way in the COMPLEX_CST, handler
>>>> but OTOH, I don't see any reason why this constant has to be misaligned
>>>> when it can be easily aligned, which avoids the need for a misaligned access.
>>>
>>> If the COMPLEX_CST happends to end up in unaligned memory then that's
>>> of course a bug (unless the target requests that for all COMPLEX_CSTs).
>>> That is, if the unalignment is triggered because the store is to an
>>> unaligned decl.
>>>
>>> But I think the issue is the above one?
>>>
>>
>> yes initially the constant seems to be unaligned. then it is expanded,
>> and there is no special handling for unaligned constants in expand_expr_real,
>> and then probably expand_assignment or store_expr seem not fully prepared for
>> this either.
> 
> With a cross I see the constant has regular aligned _Complex type
> so not sure how it can end up unaligned.
> 

Maybe a target configuration issue.
Not sure, I have configured mine this way:

../gcc-trunk/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 --target=arm-linux-gnueabihf --enable-languages=all --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard

However it appears now there are two different errors, one is in expand_assignment
which you found (I start to wonder if I should add you to the authors section
of this patch), and a different one, which I have not yet simplified,
but you can easily try that for yourself:

make check-gcc-c RUNTESTFLAGS="struct-layout-1.exp=*"

it is okay when the test fails to execute but there should no internal compiler errors.


>>>>
>>>> The problem is that the code that handles this misaligned access
>>>> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore
>>>> MEM_ALIGN == 32, and therefore the code that handles the unaligned
>>>> access is not taken.  BUT before the mem_rtx is returned it is
>>>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion,
>>>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be
>>>> usable with emit_move_insn.
>>>
>>> yes, as said the _access_ determines the address should be aligned
>>> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according
>>> to the access type/mode.  But we can't trust DECL_ALIGN of
>>> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn
>>> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just
>>> we may not compute nonzero bits for the actual address because
>>> of function pointer mangling)
>>> (for accessing function code I'd say this would be premature
>>> optimization, but ...)
>>>
>>
>> Not a very nice solution, but it is not worth to spend much effort
>> in optimizing undefined behavior, I just want to avoid the ICE
>> at this time and would not trust the DECL_ALIGN either.
> 
> So I meant
> 
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      (revision 274534)
> +++ gcc/builtins.c      (working copy)
> @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign
>  
>    /* Extract alignment information from the innermost object and
>       possibly adjust bitpos and offset.  */
> -  if (TREE_CODE (exp) == FUNCTION_DECL)
> +  if (TREE_CODE (exp) == FUNCTION_DECL
> +      && addr_p)
>      {
>        /* Function addresses can encode extra information besides their
>          alignment.  However, if TARGET_PTRMEMFUNC_VBIT_LOCATION
> 
> so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we
> can trust it).
> 
>>>
>>> Still I think you can't simply override STACK_SLOT_ALIGNMENT just because
>>> of the mode of an entry param, can you?  If you can assume a bigger
>>> alignment then STACK_SLOT_ALIGNMENT should return it.
>>>
>>
>> I don't see a real problem here.  All target except i386 and gcn (whatever that is)
>> use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) align value
>> to rule the effective STACK_SLOT_ALIGNMENT.  The user could have simply declared
>> the local variable with the alignment that results in better code FWIW.
>>
>> If the stack alignment is too high that is capped in assign_stack_local:
>>
>>   /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT.  */
>>   if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
>>     {
>>       alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
>>       alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT;
>>     }
>>
>> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should
>> be sufficient for all modes that need movmisalign_optab and friends.
>> If it is not, an ICE would be just fine.
> 
> Hmm.  In some way we could better communicate with the user then
> and do not allow under-aligning automatic vars?  But the you
> still have packed structs with BLKmode where the actual field
> accesses will carry SImode even when not aligned(?)
> 

Yes, that works also when unaligned.

> 
> Please split it into the parts for the PR and parts making the
> asserts not trigger.
> 

Yes, will do.

> The PR is already fixed, right?  The assign_parm_find_stack_rtl hunk
> is merely an optimization?
> 

Hmmmm...  You are right, I should have added that to the commit message...

Of course the test cases try to verify the optimization.


Thanks
Bernd.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]