[ping] Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref'

Thomas Schwinge thomas@codesourcery.com
Mon Aug 16 08:08:42 GMT 2021


Hi!

Ping.


On 2021-08-09T16:16:51+0200, I wrote:
> [from internal]
>
>
> Hi!
>
> This concerns a class of ICEs seen as of og10 branch with the
> "openacc: Middle-end worker-partitioning support" and "amdgcn:
> Enable OpenACC worker partitioning for AMD GCN" changes applied:
>
> On 2020-06-06T16:07:36+0100, Kwok Cheung Yeung <kwok_yeung@mentor.com> wrote:
>> On 01/06/2020 8:48 pm, Kwok Cheung Yeung wrote:
>>> On 21/05/2020 10:23 pm, Kwok Cheung Yeung wrote:
>>>> These all have the same failure mode:
>>>>
>>>> during RTL pass: expand
>>>> [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90: In function 'MAIN__._omp_fn.1':
>>>> [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86: internal compiler error: in convert_memory_address_addr_space_1, at explow.c:302
>>>> 0xc29f20 convert_memory_address_addr_space_1(scalar_int_mode, rtx_def*, unsigned char, bool, bool)
>>>>          [...]/gcc/explow.c:302
>>>> 0xc29f57 convert_memory_address_addr_space(scalar_int_mode, rtx_def*, unsigned char)
>>>>          [...]/gcc/explow.c:404
>>>> [...]
>
>>>> This occurs if the -ftree-slp-vectorize flag is specified (default at -O3).
>
>>> The problematic bit of Gimple code is this:
>>>
>>>    .oacc_worker_o.44._120 = gangs_min_472;
>>>    .oacc_worker_o.44._122 = workers_min_473;
>>>    .oacc_worker_o.44._124 = vectors_min_474;
>>>    .oacc_worker_o.44._126 = gangs_max_475;
>>>    .oacc_worker_o.44._128 = workers_max_476;
>>>    .oacc_worker_o.44._130 = vectors_max_477;
>>>    .oacc_worker_o.44._132 = 0;
>>>
>>> With SLP vectorization enabled, it becomes this:
>>>
>>>    _40 = {gangs_min_472, workers_min_473, vectors_min_474, gangs_max_475};
>>>    ...
>>>    MEM <vector(4) int> [(int *)&.oacc_worker_o.44] = _40;
>>>    .oacc_worker_o.44._128 = workers_max_476;
>>>    .oacc_worker_o.44._130 = vectors_max_477;
>>>    .oacc_worker_o.44._132 = 0;
>>>
>>> The optimization is trying to transform 4 separate assignments into a single
>>> memory operation. The trouble is that &o.acc_worker_o is an SImode pointer in
>>> AS4 (LDS), while the memory expression appears to be in the default memory
>>> space. The 'to' expression of the assignment is:
>>>
>>>   <mem_ref 0x7ffff74c61e0
>>>      type <vector_type 0x7ffff7470498
>>>          type <integer_type 0x7ffff73195e8 int public SI
>>>              size <integer_cst 0x7ffff7318bb8 constant 32>
>>>              unit-size <integer_cst 0x7ffff7318bd0 constant 4>
>>>              align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type 0x7ffff73195e8 precision:32 min <integer_cst 0x7ffff7318b70 -2147483648> max <integer_cst 0x7ffff7318b88 2147483647>
>>>              pointer_to_this <pointer_type 0x7ffff73209d8> reference_to_this <reference_type 0x7ffff73d9d20>>
>>>          TI
>>>          size <integer_cst 0x7ffff7318ca8 constant 128>
>>>          unit-size <integer_cst 0x7ffff7318cc0 constant 16>
>>>          align:128 warn_if_not_align:0 symtab:0 alias-set 1 structural-equality nunits:4
>>>          pointer_to_this <pointer_type 0x7ffff7470540>>
>>>
>>>      arg:0 <addr_expr 0x7ffff74cdb80
>>>          type <pointer_type 0x7ffff73209d8 type <integer_type 0x7ffff73195e8 int>
>>>              public unsigned DI
>>>              size <integer_cst 0x7ffff7318978 constant 64>
>>>              unit-size <integer_cst 0x7ffff7318990 constant 8>
>>>              align:64 warn_if_not_align:0 symtab:0 alias-set 2 structural-equality>
>>>          constant
>>>          arg:0 <var_decl 0x7ffff7477f30 .oacc_worker_o.44 type <record_type 0x7ffff73eb888 .oacc_ws_data_s.21 address-space-4>
>>>              addressable used static ignored BLK [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86:0
>>>
>>>              size <integer_cst 0x7ffff746ce70 constant 224>
>>>              unit-size <integer_cst 0x7ffff746ce40 constant 28>
>>>              align:128 warn_if_not_align:0
>>>              (mem/c:BLK (symbol_ref:SI (".oacc_worker_o.44.14") [flags 0x2] <var_decl 0x7ffff7477f30 .oacc_worker_o.44>) [9 .oacc_worker_o.44+0 S28 A128 AS4])>>
>>>      arg:1 <integer_cst 0x7ffff73ff078 type <pointer_type 0x7ffff73209d8> constant 0>>
>>>
>>> In convert_memory_address_addr_space_1:
>>>
>>> #ifndef POINTERS_EXTEND_UNSIGNED
>>>    gcc_assert (GET_MODE (x) == to_mode || GET_MODE (x) == VOIDmode);
>>>    return x;
>>> #else /* defined(POINTERS_EXTEND_UNSIGNED) */
>>>
>>> POINTERS_EXTEND_UNSIGNED is not defined, so it hits the assert. The expected
>>> to_mode is DI_mode, but x is SI_mode, so the assert fires.
>
>> I now have a fix for this.
>>
>>  >    MEM <vector(4) int> [(int *)&.oacc_worker_o.44] = _40;
>>
>> The ICE occurs because the SLP vectorization pass creates the new statement
>> using the type of the expression '&.oacc_worker_o.44', which is a pointer to a
>> component ref in the default address space. The expand pass gets confused
>> because it is handed an SImode pointer (for LDS) when it is expecting a DImode
>> pointer (for flat/global space).
>>
>> The underlying problem is that although .oacc_worker_o is in the correct address
>> space, the component ref .oacc_worker_o is not. I fixed this by propagating the
>> address space of .oacc_worker_o when the component ref is created.
>
>>  static tree
>>  oacc_build_component_ref (tree obj, tree field)
>>  {
>> -  tree ret = build3 (COMPONENT_REF, TREE_TYPE (field), obj, field, NULL);
>> +  tree field_type = TREE_TYPE (field);
>> +  tree obj_type = TREE_TYPE (obj);
>> +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (obj_type)))
>> +    field_type = build_qualified_type
>> +                     (field_type,
>> +                      KEEP_QUAL_ADDR_SPACE (TYPE_QUALS (obj_type)));
>> +
>> +  tree ret = build3 (COMPONENT_REF, field_type, obj, field, NULL);
>>    if (TREE_THIS_VOLATILE (field))
>>      TREE_THIS_VOLATILE (ret) |= 1;
>>    if (TREE_READONLY (field))
>
> This code change has been included in the recent master branch commit
> e2a58ed6dc5293602d0d168475109caa81ad0f0d "openacc: Middle-end
> worker-partitioning support", which thus includes a
> 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref' that is
> slightly different from 'gcc/omp-low.c:omp_build_component_ref'.
>
> I'm confirming that with this reverted, we're seeing ICEs as follows:
>
>     +FAIL: libgomp.oacc-fortran/gemm-2.f90 [...] -foffload=amdgcn-amdhsa  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/gemm-2.f90 [...] -foffload=amdgcn-amdhsa  -O3 -g  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/gemm.f90 [...] -foffload=amdgcn-amdhsa  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/gemm.f90 [...] -foffload=amdgcn-amdhsa  -O3 -g  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/optional-reduction.f90 [...] -foffload=amdgcn-amdhsa  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/optional-reduction.f90 [...] -foffload=amdgcn-amdhsa  -O3 -g  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/private-variables.f90 [...] -foffload=amdgcn-amdhsa  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/private-variables.f90 [...] -foffload=amdgcn-amdhsa  -O3 -g  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/reduction-1.f90 [...] -foffload=amdgcn-amdhsa  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/reduction-1.f90 [...] -foffload=amdgcn-amdhsa  -O3 -g  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/reduction-5.f90 [...] -foffload=amdgcn-amdhsa  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/reduction-5.f90 [...] -foffload=amdgcn-amdhsa  -O3 -g  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/reduction-6.f90 [...] -foffload=amdgcn-amdhsa  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
>
>     +FAIL: libgomp.oacc-fortran/reduction-6.f90 [...] -foffload=amdgcn-amdhsa  -O3 -g  (internal compiler error)
>
> Concerning the current 'gcc/omp-low.c:omp_build_component_ref', for the
> current set of offloading testcases, we never see a
> '!ADDR_SPACE_GENERIC_P' there, so the address space handling doesn't seem
> to be necessary there (but also won't do any harm: no-op).
>
> Would it make sense to "Re-unify 'omp_build_component_ref' and
> 'oacc_build_component_ref'", see attached?
>
>
> Grüße
>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Re-unify-omp_build_component_ref-and-oacc_build_comp.patch
Type: text/x-diff
Size: 4596 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210816/c4bd54df/attachment.bin>


More information about the Gcc-patches mailing list