[rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision
Richard Guenther
rguenther@suse.de
Wed Apr 25 08:29:00 GMT 2012
On Tue, 24 Apr 2012, Ulrich Weigand wrote:
> Hello,
>
> PR 52633 is caused by bad interaction between two different vectorizer
> pattern recognition passed. A minimal test case is:
>
> void
> test (unsigned short *x, signed char *y)
> {
> int i;
> for (i = 0; i < 32; i++)
> x[i] = (short) (y[i] << 5);
> }
>
> built with "cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp"
> on a arm-linux-gnueabi target.
>
> Before the vectorizer, we have something like:
>
> short unsigned int D.4976;
> int D.4975;
> int D.4974;
> signed char D.4973;
> signed char * D.4972;
> short unsigned int * D.4970;
> [snip]
> D.4973_8 = *D.4972_7;
> D.4974_9 = (int) D.4973_8;
> D.4975_10 = D.4974_9 << 5;
> D.4976_11 = (short unsigned int) D.4975_10;
> *D.4970_5 = D.4976_11;
>
>
> The pattern recognizer now goes through its list of patterns it tries
> to detect. The first successful one is vect_recog_over_widening_pattern.
> This will annotate the statements (via related_stmt fields):
>
> D.4973_8 = *D.4972_7;
> D.4974_9 = (int) D.4973_8;
> --> D.4992_26 = (signed short) D.4973_8
> D.4975_10 = D.4974_9 << 5;
> --> patt.16_31 = D.4992_26 << 5
> D.4976_11 = (short unsigned int) D.4975_10;
> --> D.4994_32 = (short unsigned int) patt.16_31
> *D.4970_5 = D.4976_11;
>
>
> In the next step, vect_recog_widen_shift_pattern *also* matches, and
> creates a new annotation for the shift statement (using a widening
> shift):
>
> D.4973_8 = *D.4972_7;
> D.4974_9 = (int) D.4973_8;
> --> D.4992_26 = (signed short) D.4973_8
> D.4975_10 = D.4974_9 << 5;
> [--> patt.16_31 = D.4992_26 << 5]
> --> patt.17_33 = D.4973_8 w<< 5
> D.4976_11 = (short unsigned int) D.4975_10;
> --> D.4994_32 = (short unsigned int) patt.16_31
> *D.4970_5 = D.4976_11;
>
>
> Since the original statement can only point to a single related_stmt,
> the statement setting patt.16_31 is now longer refered to as related_stmt
> by any other statement. This causes it to no longer be considered
> relevant for the vectorizer.
>
> However, the statement:
> --> D.4994_32 = (short unsigned int) patt.16_31
> *is* still considered relevant. While analysing it, however, the
> vectorizer follows through to the def statement for patt.16_31,
> and gets quite confused to find that it doesn't have a vectype
> (because it wasn't considered by the vectorizer). The symptom
> is a failing assertion
> gcc_assert (*vectype != NULL_TREE);
> in vect_is_simple_use_1.
>
>
> Now, it seems quite unusual for multiple patterns to match for a
> single original statement. In fact, most pattern recognizers
> explicitly refuse to even consider statements that were already
> recognized. However, vect_recog_widen_shift_pattern makes an
> exception:
>
> /* This statement was also detected as over-widening operation (it can't
> be any other pattern, because only over-widening detects shifts).
> LAST_STMT is the final type demotion statement, but its related
> statement is shift. We analyze the related statement to catch cases:
>
> orig code:
> type a_t;
> itype res;
> TYPE a_T, res_T;
>
> S1 a_T = (TYPE) a_t;
> S2 res_T = a_T << CONST;
> S3 res = (itype)res_T;
>
> (size of type * 2 <= size of itype
> and size of itype * 2 <= size of TYPE)
>
> code after over-widening pattern detection:
>
> S1 a_T = (TYPE) a_t;
> --> a_it = (itype) a_t;
> S2 res_T = a_T << CONST;
> S3 res = (itype)res_T; <--- LAST_STMT
> --> res = a_it << CONST;
>
> after widen_shift:
>
> S1 a_T = (TYPE) a_t;
> --> a_it = (itype) a_t; - redundant
> S2 res_T = a_T << CONST;
> S3 res = (itype)res_T;
> --> res = a_t w<< CONST;
>
> i.e., we replace the three statements with res = a_t w<< CONST. */
>
>
> If everything were indeed as described in that comment, things would work out
> fine. However, what is described above as "code after over-widening pattern
> detection" is only one of two possible outcomes of that latter pattern; the
> other is the one that happens in the current test case, where we still have
> a final type conversion left after applying the over-widening pattern.
>
>
> I guess one could try to distiguish the two cases somehow and handle both;
> but the overall approach seems quite fragile to me; it doesn't look really
> maintainable to have to rely on so many details of the operation of one
> particular pattern detection function while writing another one, or else
> risk creating subtle problems like the one described above.
>
> So I was wondering why vect_recog_widen_shift_pattern tries to take advantage
> of an already recognized over-widening pattern. But indeed, if it does not,
> it will generate less efficient code in cases like the above test case: by
> itself vect_recog_widen_shift_pattern, would generate code to expand the
> char to a short, then do a widening shift resulting in an int, followed
> by narrowing back down to a short.
>
>
> However, even so, it might actually be preferable to just handle such
> cases within vect_recog_widen_shift_pattern itself. Indeed, the routine
> already looks for another subsequent type cast, in order to handle
> unsigned shift variants. Maybe it simply ought to always look for
> another cast, and detect over-widening situations itself?
>
>
> Does this look reasonable? Any comments or suggestions appreciated!
Yes, getting rid of this fragile interaction by doing more work in
vect_recog_widen_shift_pattern sounds like the correct thing to do.
Richard.
More information about the Gcc-patches
mailing list