[rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

Wed Apr 25 08:29:00 GMT 2012

On Tue, 24 Apr 2012, Ulrich Weigand wrote:

> Hello,
> 
> PR 52633 is caused by bad interaction between two different vectorizer
> pattern recognition passed.  A minimal test case is:
> 
> void
> test (unsigned short *x, signed char *y)
> {
>   int i;
>   for (i = 0; i < 32; i++)
>     x[i] = (short) (y[i] << 5);
> }
> 
> built with "cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp"
> on a arm-linux-gnueabi target.
> 
> Before the vectorizer, we have something like:
> 
>   short unsigned int D.4976;
>   int D.4975;
>   int D.4974;
>   signed char D.4973;
>   signed char * D.4972;
>   short unsigned int * D.4970;
> [snip]
>   D.4973_8 = *D.4972_7;
>   D.4974_9 = (int) D.4973_8;
>   D.4975_10 = D.4974_9 << 5;
>   D.4976_11 = (short unsigned int) D.4975_10;
>   *D.4970_5 = D.4976_11;
> 
> 
> The pattern recognizer now goes through its list of patterns it tries
> to detect.  The first successful one is vect_recog_over_widening_pattern.
> This will annotate the statements (via related_stmt fields):
> 
>   D.4973_8 = *D.4972_7;
>   D.4974_9 = (int) D.4973_8;
>     --> D.4992_26 = (signed short) D.4973_8
>   D.4975_10 = D.4974_9 << 5;
>     --> patt.16_31 = D.4992_26 << 5
>   D.4976_11 = (short unsigned int) D.4975_10;
>     --> D.4994_32 = (short unsigned int) patt.16_31
>   *D.4970_5 = D.4976_11;
> 
> 
> In the next step, vect_recog_widen_shift_pattern *also* matches, and
> creates a new annotation for the shift statement (using a widening
> shift):
> 
>   D.4973_8 = *D.4972_7;
>   D.4974_9 = (int) D.4973_8;
>     --> D.4992_26 = (signed short) D.4973_8
>   D.4975_10 = D.4974_9 << 5;
>    [--> patt.16_31 = D.4992_26 << 5]
>     --> patt.17_33 = D.4973_8 w<< 5
>   D.4976_11 = (short unsigned int) D.4975_10;
>     --> D.4994_32 = (short unsigned int) patt.16_31
>   *D.4970_5 = D.4976_11;
> 
> 
> Since the original statement can only point to a single related_stmt,
> the statement setting patt.16_31 is now longer refered to as related_stmt
> by any other statement.  This causes it to no longer be considered
> relevant for the vectorizer.
> 
> However, the statement:
>     --> D.4994_32 = (short unsigned int) patt.16_31 
> *is* still considered relevant.  While analysing it, however, the
> vectorizer follows through to the def statement for patt.16_31,
> and gets quite confused to find that it doesn't have a vectype
> (because it wasn't considered by the vectorizer).  The symptom
> is a failing assertion
>       gcc_assert (*vectype != NULL_TREE);
> in vect_is_simple_use_1.
> 
> 
> Now, it seems quite unusual for multiple patterns to match for a
> single original statement.  In fact, most pattern recognizers
> explicitly refuse to even consider statements that were already
> recognized.  However, vect_recog_widen_shift_pattern makes an
> exception:
> 
>       /* This statement was also detected as over-widening operation (it can't
>          be any other pattern, because only over-widening detects shifts).
>          LAST_STMT is the final type demotion statement, but its related
>          statement is shift.  We analyze the related statement to catch cases:
> 
>          orig code:
>           type a_t;
>           itype res;
>           TYPE a_T, res_T;
> 
>           S1 a_T = (TYPE) a_t;
>           S2 res_T = a_T << CONST;
>           S3 res = (itype)res_T;
> 
>           (size of type * 2 <= size of itype
>            and size of itype * 2 <= size of TYPE)
> 
>          code after over-widening pattern detection:
> 
>           S1 a_T = (TYPE) a_t;
>                --> a_it = (itype) a_t;
>           S2 res_T = a_T << CONST;
>           S3 res = (itype)res_T;  <--- LAST_STMT
>                --> res = a_it << CONST;
> 
>          after widen_shift:
> 
>           S1 a_T = (TYPE) a_t;
>                --> a_it = (itype) a_t; - redundant
>           S2 res_T = a_T << CONST;
>           S3 res = (itype)res_T;
>                --> res = a_t w<< CONST;
> 
>       i.e., we replace the three statements with res = a_t w<< CONST.  */
> 
> 
> If everything were indeed as described in that comment, things would work out
> fine.  However, what is described above as "code after over-widening pattern
> detection" is only one of two possible outcomes of that latter pattern; the
> other is the one that happens in the current test case, where we still have
> a final type conversion left after applying the over-widening pattern.
> 
> 
> I guess one could try to distiguish the two cases somehow and handle both;
> but the overall approach seems quite fragile to me; it doesn't look really
> maintainable to have to rely on so many details of the operation of one
> particular pattern detection function while writing another one, or else
> risk creating subtle problems like the one described above.
> 
> So I was wondering why vect_recog_widen_shift_pattern tries to take advantage
> of an already recognized over-widening pattern.  But indeed, if it does not,
> it will generate less efficient code in cases like the above test case: by
> itself vect_recog_widen_shift_pattern, would generate code to expand the
> char to a short, then do a widening shift resulting in an int, followed
> by narrowing back down to a short.
> 
> 
> However, even so, it might actually be preferable to just handle such
> cases within vect_recog_widen_shift_pattern itself.  Indeed, the routine
> already looks for another subsequent type cast, in order to handle
> unsigned shift variants.  Maybe it simply ought to always look for
> another cast, and detect over-widening situations itself?
> 
> 
> Does this look reasonable?  Any comments or suggestions appreciated!

Yes, getting rid of this fragile interaction by doing more work in
vect_recog_widen_shift_pattern sounds like the correct thing to do.

Richard.