This is due to Combine merging a DUP instruction with either a load
or MLA - we can't force it to prefer one over the other. However the
generated vector loop is fast either way since it generates MLA and
merges the DUP either with a load or MLA. So relax the conditions
slightly and check we still generate MLA and there is no DUP or FMOV.