[C Patch]: pr52543
Ramana Radhakrishnan
ramana.radhakrishnan@linaro.org
Fri Mar 30 08:34:00 GMT 2012
On 29 March 2012 22:10, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> This patch takes a different approach to fixing PR52543 than does the patch
> in
>
> http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00641.html
>
> This patch transforms the lower-subreg pass(es) from unconditionally
> splitting wide moves, zero extensions, and shifts, so that it now takes into
> account the target specific costs and only does the transformations if it is
> profitable.
>
> Unconditional splitting is a problem that not only occurs on the AVR but is
> also a problem on the ARM NEON and my private port. Furthermore, it is a
> problem that is likely to occur on most modern larger machines since these
> machines are more likely to have fast instructions for moving things that
> are larger than word mode.
Nice - this means that atleast one pending patches for subreg
style operations for neon intrinsics can go in after appropriate tweaking.
of costs. It probably requires some tweaking and benchmarking on ARM, but
the case where we saw such spills to the stack with subreg style operations is
now much improved , indicating that the existing costs infrastructure
manages to get this right atleast for this case.
Richard(S) - If you remember your PR48941 patch - after applying the
lower-subreg patch I now see far better code and what one gets out of
-fno-split-wide-types but a lot of that gratuitous spillng has gone away.
There are still too many moves between Neon registers but there are
far less moves
to the integer side and the gratuitous spilling is now gone.
old on left - new on right ( i.e. Kenneth's patch + Richard's PR48941 patch
http://patchwork.ozlabs.org/patch/130429/)
regards
Ramana
-------------- next part --------------
cross: cross:
@ args = 0, pretend = 0, frame = 16 | @ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0 | @ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated. @ link register save eliminated.
str fp, [sp, #-4]! | vldmia r0, {d26-d27}
add fp, sp, #0 | vldmia r1, {d24-d25}
sub sp, sp, #20 | vmov q10, q13 @ v4sf
vldmia r0, {d16-d17} | vmov q11, q13 @ v4sf
vmov q10, q8 @ v4sf | vmov q8, q12 @ v4sf
sub sp, sp, #48 | vmov q9, q12 @ v4sf
> vzip.32 q10, q11
> vzip.32 q8, q9
> vmov q14, q10 @ v4sf
vmov q12, q8 @ v4sf vmov q12, q8 @ v4sf
add r3, sp, #15 | vmov d21, d22 @ v2sf
bic r3, r3, #15 | vmul.f32 d16, d29, d18
vzip.32 q10, q12 | vmul.f32 d17, d21, d24
vstmia r3, {d20-d21} | vmov d19, d18 @ v2sf
vstr d24, [r3, #16] | vmul.f32 d18, d28, d25
vstr d25, [r3, #24] | vmls.f32 d16, d21, d25
vldmia r1, {d16-d17} | vmls.f32 d17, d28, d19
vmov q9, q8 @ v4sf | vmls.f32 d18, d29, d24
vmov q11, q8 @ v4sf | vmov d26, d16 @ v2sf
vzip.32 q9, q11 | vmov d27, d17 @ v2sf
vstmia r3, {d18-d19} | vmov d17, d18 @ v2sf
vstr d22, [r3, #16] | vuzp.32 d26, d27
vstr d23, [r3, #24] | vmov d16, d26 @ v2sf
vmov d25, d18 @ v2sf | vmov r0, r1, d16 @ v4sf
vmul.f32 d17, d21, d22 | vmov r2, r3, d17
vmul.f32 d18, d24, d18 <
vmov d16, d19 @ v2sf <
vmul.f32 d19, d20, d19 <
vmls.f32 d17, d24, d16 <
vmls.f32 d18, d20, d22 <
vmls.f32 d19, d21, d25 <
vuzp.32 d17, d18 <
vmov d20, d17 @ v2sf <
vmov d21, d19 @ v2sf <
vmov r0, r1, d20 @ v4sf <
vmov r2, r3, d21 <
add sp, fp, #0 <
ldmfd sp!, {fp} <
bx lr bx lr
More information about the Gcc-patches
mailing list