This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
- From: Christophe Lyon <christophe dot lyon at linaro dot org>
- To: Alan Lawrence <alan dot lawrence at foss dot arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 25 Jan 2016 15:31:16 +0100
- Subject: Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
- Authentication-results: sourceware.org; auth=none
- References: <CAKdteOZbgTw44UarkL9zmmZBd3_gGPof9v0m8mPH-xu9=8yqqQ at mail dot gmail dot com> <1453143711-21320-1-git-send-email-alan dot lawrence at arm dot com> <CAKdteOaNmfhgeFVeaEgddvQ-FntWudtQSowYgPtW=xi8Ert60Q at mail dot gmail dot com> <569E4D77 dot 6000809 at foss dot arm dot com> <CAKdteOa+mqbODEs7jWb-Cj4H+OX93jDPwH1uauY2=0Tn-S=LpA at mail dot gmail dot com> <56A261B1 dot 6010107 at foss dot arm dot com>
On 22 January 2016 at 18:06, Alan Lawrence <alan.lawrence@foss.arm.com> wrote:
> On 20/01/16 21:10, Christophe Lyon wrote:
>>
>> On 19 January 2016 at 15:51, Alan Lawrence <alan.lawrence@foss.arm.com>
>> wrote:
>>>
>>> On 19/01/16 11:15, Christophe Lyon wrote:
>>>
>>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>>>>>> it was not desirable to impact neon_vrev32<mode>.
>>>>>
>>>>>
>>>>>
>>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF
>>>>> just
>>>>> as
>>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To
>>>>> gain the
>>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
>>>>> though.
>>>>>
>>>> Since this is more intrusive, I'd rather leave that part for later. OK?
>>>
>>>
>>>
>>> Sure.
>>>
>>>>>> +#ifdef __ARM_BIG_ENDIAN
>>>>>> + /* Here, 3 is (4-1) where 4 is the number of lanes. This is also
>>>>>> the
>>>>>> + right value for vectors with 8 lanes. */
>>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>>>>>> +#else
>>>>>> +#define __arm_lane(__vec, __idx) __idx
>>>>>> +#endif
>>>>>> +
>>>>>
>>>>>
>>>>>
>>>>> Looks right, but sounds... my concern here is that I'm hoping at some
>>>>> point we
>>>>> will move the *other* vget/set_lane intrinsics to use GCC vector
>>>>> extensions
>>>>> too. At which time (unlike __aarch64_lane which can be used everywhere)
>>>>> this
>>>>> will be the wrong formula. Can we name (and/or comment) it to avoid
>>>>> misleading
>>>>> anyone? The key characteristic seems to be that it is for vectors of
>>>>> 16-bit
>>>>> elements only.
>>>>>
>>>> I'm not to follow, here. Looking at the patterns for
>>>> neon_vget_lane<mode>_*internal in neon.md,
>>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses
>>>> "halfelts".
>>>>
>>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
>>>> that would be similar to the aarch64 ones (by computing the number of
>>>> lanes of the input vector), but the "q" one would use half the total
>>>> number of lanes instead?
>>>
>>>
>>>
>>> That works for me! Sthg like:
>>>
>>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
>>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
>>> (NUM_LANES(__vec)/2 - __idx)
>>> //or similarly
>>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))
>>>
>>> Alternatively I'd been thinking
>>>
>>> #define __arm_lane_32xN(__idx) __idx ^ 1
>>> #define __arm_lane_16xN(__idx) __idx ^ 3
>>> #define __arm_lane_8xN(__idx) __idx ^ 7
>>>
>>> Bear in mind PR64893 that we had on AArch64 :-(
>>>
>>
>> Here is a new version, based on the comments above.
>> I've also removed the addition of arm_fp_ok effective target since I
>> added that in my other testsuite patch.
>>
>> OK now?
>
>
> Yes. I realize my worry about PR64893 doesn't apply here since we pass
> constant lane numbers / vector bounds into __builtin_arm_lane_check. Thanks!
>
Thanks, I guess I still have to wait for Kyrill/Ramana 's approval.
> --Alan
>
>>
>> Thanks,
>>
>> Christophe
>>
>>> Cheers, Alan
>
>