[Bug target/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code

Sat Jun 7 09:12:00 GMT 2014

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2012-12-04 00:00:00         |2014-6-7

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> ---
Several things:

1) https://gcc.gnu.org/ml/gcc/2014-06/msg00063.html points out that our shrd
patterns wrongly use ashiftrt instead of lshiftrt

2) We can convince the current compiler to generate shrd by constructing
((((unsigned long long)a)<<32) | b) >> n (take care not to use '+' in place of
'|' because gcc is unable to realize that x+0 has no carry and thus leaves
plenty of unneeded code in that case). For a constant shift, it manages to
clean up all the useless code. At least that works for the 32 bit version with
-m32 and the 64 bit version (using unsigned __int128) with -m64, it doesn't
work for the 32 bit version with -m64.

3) With extra patterns as attached here, combine can handle the case where the
shift amount is constant. However, the non-constant pattern is too big for
combine. The closest it gets to matching is (b<<n)|(a>>(l-n)), but replacing l
with 32 is one more substitution than it is willing  to try (it also ignores
the REG_EQUAL note that would give (32-n) with one substitution less).
Improving combine would be nice. I am not sure what intermediate pattern (not
too artificial) we could introduce to help it. Maybe a>>(32-n), though I don't
even know if it is better to implement that as a subtraction and a shift or as
generating zero then using sh[lr]d.