This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: New regression on ARM Linux


On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>it.
>>
>>The problem appears to be in laying out arguments, specifically
>>varargs. From
>>the "good" -fdump-rtl-expand:
>>
>>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>S4 A32])
>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>      (nil))
>>(insn 19 18 20 2 (set (reg:DF 2 r2)
>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>      (nil))
>>(insn 20 19 21 2 (set (reg:SI 1 r1)
>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>      (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>>         (reg:SI 118)) reduced.c:14 -1
>>      (nil))
>>(call_insn 22 21 23 2 (parallel [
>>             (set (reg:SI 0 r0)
>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>A32])
>>
>>The struct members are
>>reg:SI 113 => int a;
>>reg:DF 112 => double b;
>>reg:SI 111 => int c;
>>
>>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>pushed
>>into virtual-outgoing-args. In contrast, post-change to
>>build_ref_of_offset, we get:
>>
>>(insn 17 16 18 2 (set (reg:SI 118)
>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>*.LC1>)) reduced.c:14 -1
>>      (nil))
>>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>virtual-outgoing-args)
>>                 (const_int 8 [0x8])) [0  S4 A64])
>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>      (nil))
>>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>S8 A64])
>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>      (nil))
>>(insn 20 19 21 2 (set (reg:SI 2 r2)
>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>      (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>>         (reg:SI 118)) reduced.c:14 -1
>>      (nil))
>>(call_insn 22 21 23 2 (parallel [
>>             (set (reg:SI 0 r0)
>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>A32])
>>
>>r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>
>>double+following int are all pushed into virtual-outgoing-args. This is
>>because
>>arm_function_arg is fed a 64-bit-aligned int as type of the second
>>argument (the
>>type constructed by build_ref_for_offset); it then executes
>>(aapcs_layout_arg,
>>arm.c line ~~5914)
>>
>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>      next even number.  */
>>   ncrn = pcum->aapcs_ncrn;
>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>     ncrn++;
>>
>>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>testcase
>>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>32-bit-aligned int instead, which works as previously.
>>
>>Passing the same members of that struct in a non-vargs call, works ok -
>>I think
>>because these use the type of the declared parameters, rather than the
>>provided
>>arguments, and the former do not have the increased alignment from
>>build_ref_for_offset.
>
> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>
> This means that
>
> Int I __aligned__(8);
>
> Is passed differently than int.
>
> Arm_function_arg needs to be fixed.

That is,

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", i, j);
}

or

myint i;
int j;
myint *p = &i;
int *q = &j;

int main()
{
  __builtin_printf("%d %d", *p, *q);
}

should behave the same.  There isn't a printf modifier for an "aligned int"
because that sort of thing doesn't make sense.  Special-casing aligned vs.
non-aligned values only makes sense for things passed by value on the stack.
And then obviously only dependent on the functuion type signature, not
on the type of the passed value.

Richard.

> Richard.
>
>>
>>FWIW, I also tried:
>>
>>__attribute__((__aligned__((16)))) int x;
>>int main (void)
>>{
>>   __builtin_printf("%d\n", x);
>>}
>>
>>but in that case, the arm_function_arg is still fed a type with
>>alignment 32
>>(bits), i.e. distinct from the type of the field 'x' in memory, which
>>has
>>alignment 128.
>>
>>--Alan
>>
>>Richard Biener wrote:
>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>
>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>
>>>>> ...actually attach the testcase...
>>>> What compile options?
>>>
>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>> I can't see anything not guaranteeing that:
>>>
>>>         .section        .rodata
>>>         .align  3
>>> .LANCHOR0 = . + 0
>>> .LC1:
>>>         .ascii  "%d %g %d\012\000"
>>>         .space  6
>>> .LC0:
>>>         .word   7
>>>         .space  4
>>>         .word   0
>>>         .word   1075838976
>>>         .word   9
>>>         .space  4
>>>
>>> maybe there is some more generic code-gen bug for aligned aggregate
>>> copy?  That is, the patch tells the backend that the loads and
>>> stores to the 'int' vars (which have padding followed) is aligned
>>> to 8 bytes.
>>>
>>> I don't see what is wrong in the final assembler, but maybe
>>> some endian issue exists?  The code looks quite ugly though ;)
>>>
>>> Richard.
>>>
>>>>> Alan Lawrence wrote:
>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>testsuite on ARM
>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>following this
>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>(including
>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>the
>>>>>> expected
>>>>>> 7 8 9
>>>>>> whereas with gcc r221348, instead it prints
>>>>>> 0 8 0
>>>>>>
>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>a
>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>sizes-gimplified type_0
>>>>>> BLK
>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>0x2b9b8d428d20
>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>> 0x2b9b8d092690 int>
>>>>>>              SI file reduced.c line 12 col 7
>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>              align 32 offset_align 64
>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>context
>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>D.6070>
>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>><type_decl
>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>
>>>>>> The tree-optimized output is the same with both compilers (as this
>>does not
>>>>>> mention alignment); the expand output differs.
>>>>>>
>>>>>> Still investigating...
>>>>>>
>>>>>> --Alan
>>>>>>
>>>>>>
>>>>>> Richard Biener wrote:
>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>> drops alignment info unnecessarily.
>>>>>>>
>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>
>>>>>>>  PR tree-optimization/65310
>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>  alignment.
>>>>>>>
>>>>>>> Index: gcc/tree-sra.c
>>>>>>>
>>===================================================================
>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>    if (misalign != 0)
>>>>>>>      align = (misalign & -misalign);
>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>off);
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]