[Bug rtl-optimization/84101] [7/8/9 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Jan 31 13:38:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
lower-subreg.c doesn't consider this for multiple reasons: 1) it doesn't have
VEC_CONCAT handling, but that could be easily added 2) V2DImode isn't
considered, because its move cost is the same as scalar move cost 3) in
(insn 10 9 11 2 (set (reg:V2DI 90)
        (vec_concat:V2DI (reg:DI 92)
            (reg:DI 94))) "pr84101.c":9:10 4128 {vec_concatv2di}
     (nil))
(insn 11 10 12 2 (set (reg:TI 87 [ D.1913 ])
        (subreg:TI (reg:V2DI 90) 0)) "pr84101.c":9:10 65 {*movti_internal}
     (nil))
(insn 12 11 16 2 (set (reg:TI 88 [ <retval> ])
        (reg:TI 87 [ D.1913 ])) "pr84101.c":9:10 65 {*movti_internal}
     (nil))
(insn 16 12 17 2 (set (reg/i:TI 0 ax)
        (reg:TI 88 [ <retval> ])) "pr84101.c":10:1 65 {*movti_internal}
     (nil))
there aren't any reasons to make the pseudos 87 or 88 decomposable, the result
is only used as TImode, not in DImode subregs thereof.
So, right now pseudo 90 ends up in non_decomposable_context (something could be
done about that), but as nothing ends up being in decomposable_context, nothing
is done anyway.

Now, I guess the reason why we should split somewhere this V2DI appart is
mainly the high cost of moving the (2!) integer registers from GPRs to SSE
registers and move the result back, maybe lower-subreg.c would need to treat it
differently based on the costs of VEC_CONCAT with integral operands (though,
x86 rtx_cost claims it is very cheap).

Unfortunately, HARD_REGNO_MODE_OK doesn't allow V2DImode to live in a pair of
GPRs, so the RA can't solve this say through having the vec_concat V2DI pattern
have a =r,r,r alternative.


More information about the Gcc-bugs mailing list