[PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support
钟居哲
juzhe.zhong@rivai.ai
Wed May 24 15:42:12 GMT 2023
Hi, Richard. I still don't understand it. Sorry about that.
>> loop_len_48 = MIN_EXPR <loop_len_34 * 2, 4>;
>> _74 = loop_len_34 * 2 - loop_len_48;
I have the tests already tested.
We have a MIN_EXPR to calculate the total elements:
loop_len_34 = MIN_EXPR <ivtmp_72, 8>;
I think "8" is already multiplied by 2?
Why do we need loop_len_34 * 2 ?
Could you give me more informations, The similiar tests you present we already have
execution check and passed. I am not sure whether this patch has the issue that I didn't notice.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-05-24 23:31
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support
钟居哲 <juzhe.zhong@rivai.ai> writes:
> Hi, the .optimized dump is like this:
>
> <bb 2> [local count: 21045336]:
> ivtmp.26_36 = (unsigned long) &x;
> ivtmp.27_3 = (unsigned long) &y;
> ivtmp.30_6 = (unsigned long) &MEM <int[200]> [(void *)&y + 16B];
> ivtmp.31_10 = (unsigned long) &MEM <int[200]> [(void *)&y + 32B];
> ivtmp.32_14 = (unsigned long) &MEM <int[200]> [(void *)&y + 48B];
>
> <bb 3> [local count: 273589366]:
> # ivtmp_72 = PHI <ivtmp_73(3), 100(2)>
> # ivtmp.26_41 = PHI <ivtmp.26_37(3), ivtmp.26_36(2)>
> # ivtmp.27_1 = PHI <ivtmp.27_2(3), ivtmp.27_3(2)>
> # ivtmp.30_4 = PHI <ivtmp.30_5(3), ivtmp.30_6(2)>
> # ivtmp.31_8 = PHI <ivtmp.31_9(3), ivtmp.31_10(2)>
> # ivtmp.32_12 = PHI <ivtmp.32_13(3), ivtmp.32_14(2)>
> loop_len_34 = MIN_EXPR <ivtmp_72, 8>;
> loop_len_48 = MIN_EXPR <loop_len_34, 4>;
> _74 = loop_len_34 - loop_len_48;
Yeah, I think this needs to be:
loop_len_48 = MIN_EXPR <loop_len_34 * 2, 4>;
_74 = loop_len_34 * 2 - loop_len_48;
(as valid gimple). The point is that...
> loop_len_49 = MIN_EXPR <_74, 4>;
> _75 = _74 - loop_len_49;
> loop_len_50 = MIN_EXPR <_75, 4>;
> loop_len_51 = _75 - loop_len_50;
...there are 4 lengths capped to 4, for a total element count of 16.
But loop_len_34 is never greater than 8.
So for this case we either need to multiply, or we need to create
a fresh IV for the second rgroup. Both approaches are fine.
Thanks,
Richard
More information about the Gcc-patches
mailing list