This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
Hi,
The attached testcase fails with -mvectorize-with-neon-quad and
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -funroll-all-loops on
arm-linux-gnueabi.
The problem essentially is that regrename wants to rename reg 103
(i.e. q10 printed as d20 in the RTL dumps ) in the following
instruction sequence. I will continue to use d20 in my description
below as it helps reference bits of the dumps.
(insn 122 161 169 3 (set (reg:V4SI 103 d20 [322])
(unspec:V4SI [
(reg:V4SI 99 d18 [316])
(reg:V4SI 103 d20 [323])
] UNSPEC_ASHIFT_SIGNED)) /tmp/x.c:16 973 {ashlv4si3_signed}
(expr_list:REG_EQUAL (ashiftrt:V4SI (reg:V4SI 99 d18 [316])
(reg:V4SI 103 d20 [320]))
(nil)))
(insn 123 139 140 3 (set (reg:V8HI 107 d22 [324])
(vec_concat:V8HI (truncate:V4HI (reg:V4SI 115 d26 [314]))
(truncate:V4HI (reg:V4SI 103 d20 [322])))) /tmp/x.c:16
1775 {vec_pack_trunc_v4si}
(nil))
(insn 140 123 141 3 (set (reg:V8HI 103 d20 [341])
(vec_concat:V8HI (truncate:V4HI (reg:V4SI 111 d24 [331]))
(truncate:V4HI (reg:V4SI 99 d18 [339])))) /tmp/x.c:16 1775
{vec_pack_trunc_v4si}
(nil))
(insn 141 140 143 3 (set (reg:V16QI 99 d18 [342])
(vec_concat:V16QI (truncate:V8QI (reg:V8HI 107 d22 [324]))
(truncate:V8QI (reg:V8HI 103 d20 [341])))) /tmp/x.c:16
1774 {vec_pack_trunc_v8hi}
(nil))
And this is because it thinks d20 is in one chain from
Register d20 (4): 122 [VFP_REGS] 123 [VFP_REGS] 141 [VFP_REGS]
as well as in a chain from :
Register d20 (4): 140 [VFP_REGS] 141 [VFP_REGS]
and it then goes ahead and renames d20 in insn 122, 123 and 141 and
ignores the fact that the use in insn 141 was essentially the def in
insn 140 and not the def from insn 122.
I can't see how it is right to construct essentially 2 chains for the
same register that have overlapping live ranges without an intervening
conditional branch and since regrename sort of works inside a bb .
Ideally the chain for 122 should have been terminated at the end of
123 rather than allowing this to remain open and have the use in insn
141 available for use in both chains starting at 122 and 140 . What
I'm not sure is which part of regrename makes sure that this part of
the comment for Stage 5 is ensured.
`and earlier
chains they would overlap with must have been closed at
the previous insn at the latest, as such operands cannot
possibly overlap with any input operands. */'
I suspect this is by using some of the conflicts info in each chain
but a quick read couldn't help me figure out where this was being done
and how we were ensuring that such an early clobber case was being
handled cleanly.
I must point out that the pattern for vec_pack_trunc in this case does
have an early clobber in the destination to prevent overlapping source
and destination registers as a vec_pack_trunc as on neon something
like this for 128 bit vectors can only be done with 2 operations .
While this isn't a wrong representation in the backend I we should get
better code without that early clobber and by just using
reg_overlap_mentioned_p as in the patch below .
However I'm not convinced that the regrename behaviour is correct and
I think it's still something worth bringing up as it possibly makes
the regrename.c behaviour latent atleast on the arm port and it might
be worth figuring out this odd behaviour.
Thanks in advance for any help in this area.
cheers
Ramana
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 24dd941..1825612 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -5631,14 +5631,19 @@
; the semantics of the instructions require.
(define_insn "vec_pack_trunc_<mode>"
- [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w")
+ [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=w")
(vec_concat:<V_narrow_pack>
(truncate:<V_narrow>
(match_operand:VN 1 "register_operand" "w"))
(truncate:<V_narrow>
(match_operand:VN 2 "register_operand" "w"))))]
"TARGET_NEON && !BYTES_BIG_ENDIAN"
- "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2"
+ {
+ if (reg_overlap_mentioned_p (operands[0], operands[1]))
+ return "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2";
+ else
+ return "vmovn.i<V_sz_elem>\t%f0, %q2\;vmovn.i<V_sz_elem>\t%e0, %q1";
+ }
[(set_attr "neon_type" "neon_shift_1")
(set_attr "length" "8")]
)
/* #include <stdint.h> */
/* #include <string.h> */
/* #include <stdlib.h> */
/* #include <stdio.h> */
typedef int int32_t;
typedef signed char int8_t;
typedef unsigned int uint32_t;
__attribute__ ((noinline)) void f883b (int8_t * result,
int32_t * __restrict arg1,
uint32_t * __restrict arg2)
{
int idx;
for (idx=48;idx<80;idx += 1) {
result[idx] = arg1[idx] >> (arg2[idx] & 7);
}
}
int8_t result[96];
int32_t arg1[96];
uint32_t arg2[96];
int main (void)
{
int i;
int correct[] = {48,24,12,6,3,1,0,0,56,28,14,7,3,1,0,0,64,32,16,8,4,2,1,0,72,36,18,9,4,2,1,0};
for (i=0; i < 96; i++)
{
arg2[i] = arg1[i] = i;
__asm__ volatile ("");
}
f883b(result, arg1, arg2);
for (i=48; i < 80; i++)
if (result[i] != correct[i-48])
{
/* fprintf (stderr, "result [%d] = %d, correct[%d - 48] = %d\n", i, result[i], i, correct[i-48]); */
abort ();
}
return 0;
}
Attachment:
x.c.203r.rnreg
Description: Binary data
Attachment:
x.c.202r.ce3
Description: Binary data
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |