This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 26 January 2016 at 14:20, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: > Hi Christophe, > > On 20/01/16 21:10, Christophe Lyon wrote: >> >> On 19 January 2016 at 15:51, Alan Lawrence <alan.lawrence@foss.arm.com> >> wrote: >>> >>> On 19/01/16 11:15, Christophe Lyon wrote: >>> >>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and >>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought >>>>>> it was not desirable to impact neon_vrev32<mode>. >>>>> >>>>> >>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF >>>>> just >>>>> as >>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To >>>>> gain the >>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, >>>>> though. >>>>> >>>> Since this is more intrusive, I'd rather leave that part for later. OK? >>> >>> >>> Sure. >>> >>>>>> +#ifdef __ARM_BIG_ENDIAN >>>>>> + /* Here, 3 is (4-1) where 4 is the number of lanes. This is also >>>>>> the >>>>>> + right value for vectors with 8 lanes. */ >>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3) >>>>>> +#else >>>>>> +#define __arm_lane(__vec, __idx) __idx >>>>>> +#endif >>>>>> + >>>>> >>>>> >>>>> Looks right, but sounds... my concern here is that I'm hoping at some >>>>> point we >>>>> will move the *other* vget/set_lane intrinsics to use GCC vector >>>>> extensions >>>>> too. At which time (unlike __aarch64_lane which can be used everywhere) >>>>> this >>>>> will be the wrong formula. Can we name (and/or comment) it to avoid >>>>> misleading >>>>> anyone? The key characteristic seems to be that it is for vectors of >>>>> 16-bit >>>>> elements only. >>>>> >>>> I'm not to follow, here. Looking at the patterns for >>>> neon_vget_lane<mode>_*internal in neon.md, >>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses >>>> "halfelts". >>>> >>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq), >>>> that would be similar to the aarch64 ones (by computing the number of >>>> lanes of the input vector), but the "q" one would use half the total >>>> number of lanes instead? >>> >>> >>> That works for me! Sthg like: >>> >>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx >>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) + >>> (NUM_LANES(__vec)/2 - __idx) >>> //or similarly >>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1)) >>> >>> Alternatively I'd been thinking >>> >>> #define __arm_lane_32xN(__idx) __idx ^ 1 >>> #define __arm_lane_16xN(__idx) __idx ^ 3 >>> #define __arm_lane_8xN(__idx) __idx ^ 7 >>> >>> Bear in mind PR64893 that we had on AArch64 :-( >>> >> Here is a new version, based on the comments above. >> I've also removed the addition of arm_fp_ok effective target since I >> added that in my other testsuite patch. >> >> OK now? >> >> Thanks, >> >> Christophe >> > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 3588b83..b1f408c 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode, > int inverse, > if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE > (el0))) > return -1; > + /* FP16 vectors cannot be represented. */ > + if (innersize == 2) > + return -1; > + > r0 = CONST_DOUBLE_REAL_VALUE (el0); > > > I think it'd be clearer to write "if (GET_MODE_INNER (mode) == HFmode)" > > +(define_expand "movv4hf" > + [(set (match_operand:V4HF 0 "s_register_operand") > + (match_operand:V4HF 1 "s_register_operand"))] > + "TARGET_NEON && TARGET_FP16" > +{ > + if (can_create_pseudo_p ()) > + { > + if (!REG_P (operands[0])) > + operands[1] = force_reg (V4HFmode, operands[1]); > + } > +}) > > Can you please add a comment saying why you need the force_reg here? > IIRC it's because of CANNOT_CHANGE_MODE_CLASS on big-endian that causes an > ICE during expand with subregs. > > I've tried this patch out and it does indeed fix the ICE on armeb. > So ok for trunk with the changes above. > Thanks, > Kyrill > > OK thanks, here is what I have committed (r232832). Christophe.
Attachment:
pr68620.final.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |