[Bug target/103252] questionable codegen with kmovd

crazylht at gmail dot com gcc-bugzilla@gcc.gnu.org
Fri Nov 19 02:42:48 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103252

--- Comment #12 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jason A. Donenfeld from comment #9)
> >  When the mask registers are available for use, RA considers them and when spilling to those is cheaper than to memory, it spills to them and not memory.
> 
> Yes, this is the thing I don't get. When you compare the codegen for avx512
> vs non-avx512, the non-avx512 doesn't spill at all there. So this isn't
> "spill to memory" vs "spill to mask register". This is "don't spill" vs
> "spill to mask register". And the latter seems clearly worse.

for non-avx512, Due to the small number of registers available, and the short
live range of r132, r132 is first
Pushing a18(r132,l0) (cost 70) ---- (allocate as mem first)
 and then finally found there're available register
Popping a18(r132,l0) -- assign reg 2. --------- (allocate as register when
there're available register)

for avx512, due to enough number of registers available, r132 is finally
assigned as alternative class. for while picture avx512 has less mem allocated.

avx2:
Disposition:
   26:r82  l1     3    1:r82  l0     3   36:r89  l1     2    2:r89  l0     2
   13:r97  l0     5   37:r101 l1     1    3:r101 l0     4   27:r103 l1   mem
    4:r103 l0   mem   38:r105 l1     0    5:r105 l0     0   28:r108 l1     6
    6:r108 l0     6    0:r112 l0     0   29:r113 l1     5    7:r113 l0     5
   30:r114 l1   mem    8:r114 l0   mem   31:r115 l1   mem    9:r115 l0   mem
   22:r118 l0     0   21:r119 l0     0   40:r128 l1     0   39:r129 l1     0
   17:r130 l0     1   16:r131 l0     2   18:r132 l0     2   15:r136 l0     1
   12:r139 l0     0   32:r142 l1   mem   10:r142 l0   mem   33:r143 l1     4
   20:r143 l0   mem   34:r144 l1   mem   11:r144 l0   mem   35:r145 l1   mem
   19:r145 l0   mem   25:r146 l0     0   24:r147 l0     1   23:r148 l0     2
   41:r149 l1     0   14:r150 l0     0

avx512:
Disposition:
   26:r82  l1     3    1:r82  l0     3   36:r89  l1     1    2:r89  l0     2
   13:r97  l0     4   37:r101 l1     2    3:r101 l0     1   27:r103 l1   mem
    4:r103 l0   mem   38:r105 l1     0    5:r105 l0     0   28:r108 l1     6
    6:r108 l0     6    0:r112 l0     0   29:r113 l1     4    7:r113 l0     4
   30:r114 l1   mem    8:r114 l0   mem   31:r115 l1   mem    9:r115 l0   mem
   22:r118 l0     0   21:r119 l0     0   40:r128 l1     0   39:r129 l1     0
   17:r130 l0     2   16:r131 l0    68   18:r132 l0    68   15:r136 l0     2
   12:r139 l0     0   32:r142 l1   mem   10:r142 l0   mem   33:r143 l1   mem
   20:r143 l0   mem   34:r144 l1     5   11:r144 l0     5   35:r145 l1   mem
   19:r145 l0   mem   25:r146 l0     0   24:r147 l0     1   23:r148 l0     2
   41:r149 l1     0   14:r150 l0     0



So for short live range reg, we may lose opportunity to allocate best regclass,
maybe add peephole2 to handle those cases instead of tune RA.


More information about the Gcc-bugs mailing list