This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: New regression on ARM Linux
- From: Richard Earnshaw <Richard dot Earnshaw at foss dot arm dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, Alan Lawrence <Alan dot Lawrence at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>
- Date: Tue, 31 Mar 2015 11:53:10 +0100
- Subject: Re: New regression on ARM Linux
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LSU dot 2 dot 11 dot 1503111607530 dot 10796 at zhemvz dot fhfr dot qr> <55193E77 dot 3040401 at arm dot com> <55193EFB dot 6030009 at arm dot com> <alpine dot LSU dot 2 dot 11 dot 1503301501070 dot 31545 at zhemvz dot fhfr dot qr> <alpine dot LSU dot 2 dot 11 dot 1503301509150 dot 31545 at zhemvz dot fhfr dot qr> <55197DAE dot 1050004 at arm dot com> <16A8C5AA-994C-4FA3-BC25-8547166DC13C at suse dot de> <CAFiYyc1MOL8AAwtVhvYDYmSSo+MdGi2FFi-6ELfRxdsvDxtSew at mail dot gmail dot com> <551A6C46 dot 7010305 at foss dot arm dot com> <alpine dot LSU dot 2 dot 11 dot 1503311200090 dot 31545 at zhemvz dot fhfr dot qr> <alpine dot LSU dot 2 dot 11 dot 1503311207080 dot 31545 at zhemvz dot fhfr dot qr> <551A776B dot 3020103 at foss dot arm dot com> <alpine dot LSU dot 2 dot 11 dot 1503311239590 dot 31545 at zhemvz dot fhfr dot qr>
On 31/03/15 11:44, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>
>> On 31/03/15 11:20, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Biener wrote:
>>>
>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>
>>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>>> it.
>>>>>>>>
>>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>>> varargs. From
>>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>>
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S4 A32])
>>>>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>> (reg:SI 118)) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>> (set (reg:SI 0 r0)
>>>>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> The struct members are
>>>>>>>> reg:SI 113 => int a;
>>>>>>>> reg:DF 112 => double b;
>>>>>>>> reg:SI 111 => int c;
>>>>>>>>
>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>>> pushed
>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>>> build_ref_of_offset, we get:
>>>>>>>>
>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750
>>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>>> virtual-outgoing-args)
>>>>>>>> (const_int 8 [0x8])) [0 S4 A64])
>>>>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S8 A64])
>>>>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>> (reg:SI 118)) reduced.c:14 -1
>>>>>>>> (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>> (set (reg:SI 0 r0)
>>>>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>>
>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>>> because
>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>>> argument (the
>>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>>> (aapcs_layout_arg,
>>>>>>>> arm.c line ~~5914)
>>>>>>>>
>>>>>>>> /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>> next even number. */
>>>>>>>> ncrn = pcum->aapcs_ncrn;
>>>>>>>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>> ncrn++;
>>>>>>>>
>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>>> testcase
>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>>
>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>>> I think
>>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>>> provided
>>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>>> build_ref_for_offset.
>>>>>>>
>>>>>>> It doesn't make sense to use the alignment of passed values. That looks like bs.
>>>>>>>
>>>>>>> This means that
>>>>>>>
>>>>>>> Int I __aligned__(8);
>>>>>>>
>>>>>>> Is passed differently than int.
>>>>>>>
>>>>>>> Arm_function_arg needs to be fixed.
>>>>>>
>>>>>> That is,
>>>>>>
>>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>> myint i = 1;
>>>>>> int j = 2;
>>>>>> __builtin_printf("%d %d\n", i, j);
>>>>>> }
>>>>>>
>>>>>> or
>>>>>>
>>>>>> myint i;
>>>>>> int j;
>>>>>> myint *p = &i;
>>>>>> int *q = &j;
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>> __builtin_printf("%d %d", *p, *q);
>>>>>> }
>>>>>>
>>>>>> should behave the same. There isn't a printf modifier for an "aligned int"
>>>>>> because that sort of thing doesn't make sense. Special-casing aligned vs.
>>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>>> And then obviously only dependent on the functuion type signature, not
>>>>>> on the type of the passed value.
>>>>>>
>>>>>
>>>>> I think the testcase is ill-formed. Just because printf doesn't have
>>>>> such a modifier, doesn't mean that another variadic function might not
>>>>> have a means to detect when an object in the variadic list needs to be
>>>>> over-aligned. As such, the test should really be written as:
>>>>
>>>> A value doesn't have "alignment". A function may have alignment
>>>> requirements on its arguments, clearly printf doesn't.
>>>
>>> Btw, it looks like function_arg is supposed to pass whether the argument
>>> is to the variadic part of the function or not, but it gets passed
>>> false as in
>>>
>>> args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
>>> argpos < n_named_args);
>>>
>>> n_named_args is 4. That is because ARM asks to do this via
>>>
>>> if (type_arg_types != 0
>>> && targetm.calls.strict_argument_naming (args_so_far))
>>> ;
>>> else if (type_arg_types != 0
>>> && ! targetm.calls.pretend_outgoing_varargs_named
>>> (args_so_far))
>>> /* Don't include the last named arg. */
>>> --n_named_args;
>>> else
>>> /* Treat all args as named. */
>>> n_named_args = num_actuals;
>>>
>>> thus you lose that info (you could have that readily available).
>>>
>>
>> From the calling side of a function it shouldn't matter. They only
>> thing the caller has to know is that the function is variadic (and
>> therefore that the base-standard rules from the AAPCS apply -- no use of
>> FP registers for parameters). The behaviour after that is *as if* all
>> the arguments were pushed onto the stack in the correct order and
>> finally the lowest 4 words are popped off the stack again into r0-r3.
>> Hence any alignment that would apply to arguments on the stack has to be
>> reflected in their allocation into r0-r3 (since the push/pop of those
>> registers never happens).
>
> But how do you compute the alignment of, say a myint '1'? Of course
> __attribute__((aligned())) is a C extension thus AAPCS likely doesn't
> consider it.
>
Front-end problem :-)
> But yes, as given even on x86_64 we might spill a 8-byte aligned
> int register to a 8-byte aligned stack slot so the proposed patch
> makes sense in that regard. I wonder how many other passes can
> get confused by this (PRE, I suppose, inserting an explicitely
> aligned load as well as all passes after the vectorizer which
> also creates loads/store with explicit (usually lower) alignment).
>
> Richard.
>
>> R.
>>
>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>
>>>>> int main()
>>>>> {
>>>>> myint i = 1;
>>>>> int j = 2;
>>>>> __builtin_printf("%d %d\n", (int) i, j);
>>>>> }
>>>>>
>>>>> Variadic functions take the types of their arguments from the types of
>>>>> the actuals passed. The compiler should either be applying appropriate
>>>>> promotion rules to make the types conformant by the language
>>>>> specification or respecting the types exactly. However, that should be
>>>>> done in the mid-end not the back-end. If incorrect alignment
>>>>> information is passed to the back-end it can't help but make the wrong
>>>>> choice. Examining the mode here wouldn't help either. Consider:
>>>>>
>>>>> int foo (int a, int b, int c, int d, int e, int f
>>>>> __attribute__((aligned(8))));
>>>>>
>>>>> On ARM that should pass f in a 64-bit aligned location (since the
>>>>> parameter will be on the stack).
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I also tried:
>>>>>>>>
>>>>>>>> __attribute__((__aligned__((16)))) int x;
>>>>>>>> int main (void)
>>>>>>>> {
>>>>>>>> __builtin_printf("%d\n", x);
>>>>>>>> }
>>>>>>>>
>>>>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>>>>> alignment 32
>>>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>>>>> has
>>>>>>>> alignment 128.
>>>>>>>>
>>>>>>>> --Alan
>>>>>>>>
>>>>>>>> Richard Biener wrote:
>>>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>>>>
>>>>>>>>>>> ...actually attach the testcase...
>>>>>>>>>> What compile options?
>>>>>>>>>
>>>>>>>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>>>>> I can't see anything not guaranteeing that:
>>>>>>>>>
>>>>>>>>> .section .rodata
>>>>>>>>> .align 3
>>>>>>>>> .LANCHOR0 = . + 0
>>>>>>>>> .LC1:
>>>>>>>>> .ascii "%d %g %d\012\000"
>>>>>>>>> .space 6
>>>>>>>>> .LC0:
>>>>>>>>> .word 7
>>>>>>>>> .space 4
>>>>>>>>> .word 0
>>>>>>>>> .word 1075838976
>>>>>>>>> .word 9
>>>>>>>>> .space 4
>>>>>>>>>
>>>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>>>>> copy? That is, the patch tells the backend that the loads and
>>>>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>>>>> to 8 bytes.
>>>>>>>>>
>>>>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>>>>> some endian issue exists? The code looks quite ugly though ;)
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>> Alan Lawrence wrote:
>>>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>>>>> testsuite on ARM
>>>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>>>>> following this
>>>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>>>>> (including
>>>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>>>>> the
>>>>>>>>>>>> expected
>>>>>>>>>>>> 7 8 9
>>>>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>>>>> 0 8 0
>>>>>>>>>>>>
>>>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>>>>> a
>>>>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>>>>> sizes-gimplified type_0
>>>>>>>>>>>> BLK
>>>>>>>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>>>> align 64 symtab 0 alias set 1 canonical type
>>>>>>>> 0x2b9b8d428d20
>>>>>>>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>>>> SI file reduced.c line 12 col 7
>>>>>>>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>>>> align 32 offset_align 64
>>>>>>>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>>>>> context
>>>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>>>>> D.6070>
>>>>>>>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>>>>> <type_decl
>>>>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>>>>
>>>>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>>>>> does not
>>>>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>>>>
>>>>>>>>>>>> Still investigating...
>>>>>>>>>>>>
>>>>>>>>>>>> --Alan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Richard Biener wrote:
>>>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de>
>>>>>>>>>>>>>
>>>>>>>>>>>>> PR tree-optimization/65310
>>>>>>>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>>>> alignment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>>>>
>>>>>>>> ===================================================================
>>>>>>>>>>>>> --- gcc/tree-sra.c (revision 221324)
>>>>>>>>>>>>> +++ gcc/tree-sra.c (working copy)
>>>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>>>> misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>>>> if (misalign != 0)
>>>>>>>>>>>>> align = (misalign & -misalign);
>>>>>>>>>>>>> - if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>>>>> + if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>>>> exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>>>>> off);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>