This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)


On 26 January 2016 at 14:20, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi Christophe,
>
> On 20/01/16 21:10, Christophe Lyon wrote:
>>
>> On 19 January 2016 at 15:51, Alan Lawrence <alan.lawrence@foss.arm.com>
>> wrote:
>>>
>>> On 19/01/16 11:15, Christophe Lyon wrote:
>>>
>>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>>>>>> it was not desirable to impact neon_vrev32<mode>.
>>>>>
>>>>>
>>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF
>>>>> just
>>>>> as
>>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To
>>>>> gain the
>>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
>>>>> though.
>>>>>
>>>> Since this is more intrusive, I'd rather leave that part for later. OK?
>>>
>>>
>>> Sure.
>>>
>>>>>> +#ifdef __ARM_BIG_ENDIAN
>>>>>> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also
>>>>>> the
>>>>>> +     right value for vectors with 8 lanes.  */
>>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>>>>>> +#else
>>>>>> +#define __arm_lane(__vec, __idx) __idx
>>>>>> +#endif
>>>>>> +
>>>>>
>>>>>
>>>>> Looks right, but sounds... my concern here is that I'm hoping at some
>>>>> point we
>>>>> will move the *other* vget/set_lane intrinsics to use GCC vector
>>>>> extensions
>>>>> too. At which time (unlike __aarch64_lane which can be used everywhere)
>>>>> this
>>>>> will be the wrong formula. Can we name (and/or comment) it to avoid
>>>>> misleading
>>>>> anyone? The key characteristic seems to be that it is for vectors of
>>>>> 16-bit
>>>>> elements only.
>>>>>
>>>> I'm not to follow, here. Looking at the patterns for
>>>> neon_vget_lane<mode>_*internal in neon.md,
>>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses
>>>> "halfelts".
>>>>
>>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
>>>> that would be similar to the aarch64 ones (by computing the number of
>>>> lanes of the input vector), but the "q" one would use half the total
>>>> number of lanes instead?
>>>
>>>
>>> That works for me! Sthg like:
>>>
>>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
>>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
>>> (NUM_LANES(__vec)/2 - __idx)
>>> //or similarly
>>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))
>>>
>>> Alternatively I'd been thinking
>>>
>>> #define __arm_lane_32xN(__idx) __idx ^ 1
>>> #define __arm_lane_16xN(__idx) __idx ^ 3
>>> #define __arm_lane_8xN(__idx) __idx ^ 7
>>>
>>> Bear in mind PR64893 that we had on AArch64 :-(
>>>
>> Here is a new version, based on the comments above.
>> I've also removed the addition of arm_fp_ok effective target since I
>> added that in my other testsuite patch.
>>
>> OK now?
>>
>> Thanks,
>>
>> Christophe
>>
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 3588b83..b1f408c 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode,
> int inverse,
>        if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE
> (el0)))
>          return -1;
>  +      /* FP16 vectors cannot be represented.  */
> +      if (innersize == 2)
> +       return -1;
> +
>        r0 = CONST_DOUBLE_REAL_VALUE (el0);
>
>
> I think it'd be clearer to write "if (GET_MODE_INNER (mode) == HFmode)"
>
> +(define_expand "movv4hf"
> +  [(set (match_operand:V4HF 0 "s_register_operand")
> +       (match_operand:V4HF 1 "s_register_operand"))]
> +  "TARGET_NEON && TARGET_FP16"
> +{
> +  if (can_create_pseudo_p ())
> +    {
> +      if (!REG_P (operands[0]))
> +       operands[1] = force_reg (V4HFmode, operands[1]);
> +    }
> +})
>
> Can you please add a comment saying why you need the force_reg here?
> IIRC it's because of CANNOT_CHANGE_MODE_CLASS on big-endian that causes an
> ICE during expand with subregs.
>
> I've tried this patch out and it does indeed fix the ICE on armeb.
> So ok for trunk with the changes above.
> Thanks,
> Kyrill
>
>

OK thanks, here is what I have committed (r232832).


Christophe.

Attachment: pr68620.final.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]