Testcase: union a { int i; float f; }; void f(float *a, int *b, float e) { union a c; c.f = *a + e; *b = c.i; } --- CUT --- Currently we get (on x86): subl $28, %esp movl 32(%esp), %eax flds 40(%esp) fadds (%eax) movl 36(%esp), %eax fstps 12(%esp) <--- extra store movl 12(%esp), %edx <--- extra load movl %edx, (%eax) <--- store result to *b addl $28, %esp ret Or with -mfpmath=sse: _f: subl $28, %esp movl 32(%esp), %eax movss 40(%esp), %xmm0 addss (%eax), %xmm0 movl 36(%esp), %eax movss %xmm0, 12(%esp) <--- extra store movl 12(%esp), %edx <--- extra load movl %edx, (%eax) <---store result to *b addl $28, %esp ret Or on PPC: f: lfs 0,0(3) stwu 1,-16(1) fadds 0,1,0 stfs 0,8(1) <--- extra store lwz 0,8(1) <--- extra load addi 1,1,16 stw 0,0(4) <--- store result to *b blr The issue is that SFmode cannot be in integer registers. The rtl looks like: (insn 8 7 9 3 t1.c:10 (set (reg:SF 124) (mem:SF (reg/v/f:SI 120 [ a ]) [2 S4 A32])) -1 (nil)) (insn 9 8 10 3 t1.c:10 (set (reg:SF 123) (plus:SF (reg/v:SF 122 [ e ]) (reg:SF 124))) -1 (nil)) (insn 10 9 11 3 t1.c:10 (set (subreg:SF (reg/v:SI 119 [ c ]) 0) (reg:SF 123)) -1 (nil)) (insn 11 10 16 3 t1.c:11 (set (mem:SI (reg/v/f:SI 121 [ b ]) [3 S4 A32]) (reg/v:SI 119 [ c ])) -1 (nil)) See how we could translate register 119 (SImode) into a register that is in SFmode and get away with it.
So if I have emit_move_insn not to produce: (insn 10 9 11 3 t1.c:10 (set (subreg:SF (reg/v:SI 119 [ c ]) 0) (reg:SF 123)) -1 (nil)) but instead: (insn 10 9 11 3 t1.c:10 (set (reg/v:SI 119 [ c ]) (subreg:SI (reg:SF 123) 0)) -1 (nil)) I could not get the (subreg:SI (reg:SF 123) 0) propgated into (insn 11 10 16 3 t1.c:11 (set (mem:SI (reg/v/f:DI 121 [ b ]) [3 S4 A32]) (reg/v:SI 119 [ c ])) -1 (nil)) I have not figured out why yet. Once that is done, we can have simplify_set in combine change the modes of the mem.
Confirmed.
With the patch proposed for PR34043 we get f: .LFB2: addss (%rdi), %xmm0 movd %xmm0, (%rsi) ret instead of f: .LFB2: addss (%rdi), %xmm0 movss %xmm0, -4(%rsp) movl -4(%rsp), %eax movl %eax, (%rsi) ret so expansion can handle f (a, b, e) { <bb 2>: *b = VIEW_CONVERT_EXPR<int>(*a + e); return; } well compared to f (a, b, e) { union a c; <bb 2>: c.f = *a + e; *b = c.i; return; }
Subject: Bug 33989 Author: rguenth Date: Fri Mar 14 14:52:07 2008 New Revision: 133218 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=133218 Log: 2008-03-14 Richard Guenther <rguenther@suse.de> PR tree-optimization/34043 PR tree-optimization/33989 * tree-ssa-pre.c (execute_pre): Allow SCCVN to do insertion when doing FRE. (bitmap_find_leader): Use extra argument to verify dominance relationship inside a basic-block. (can_PRE_operation): Add VIEW_CONVERT_EXPR. (find_leader_in_sets): Adjust. (create_component_ref_by_pieces): Take extra argument for dominance check, handle lookup failures. (find_or_generate_expression): Likewise. (create_expression_by_pieces): Likewise. (insert_into_preds_of_block): Adjust. (create_value_expr_from): If asked for, verify all operands are in the blocks AVAIL_OUT set. (make_values_for_stmt): Check for SSA_NAMEs that are life over an abnormal edge. (compute_avail): Remove such check. (do_SCCVN_insertion): New function. (eliminate): If we do not find a leader suitable for replacement insert a replacement expression from SCCVN if available. * tree-ssa-sccvn.h (run_scc_vn): Update prototype. (struct vn_ssa_aux): Add needs_insertion flag. * tree-ssa-sccvn.c (may_insert): New global flag. (copy_reference_ops_from_ref): Value-number union member access based on its size, not type and member if insertion is allowed. (visit_reference_op_load): For a weak match from union type punning lookup a view-converted value and insert a SSA_NAME for that value if that is not found. (visit_use): Make dumps shorter. Do not disallow value numbering SSA_NAMEs that are life over an abnormal edge to constants. (free_scc_vn): Release inserted SSA_NAMEs. (run_scc_vn): New flag to specify whether insertion is allowed. Process SSA_NAMEs in forward order. * tree-ssa-loop-im.c (for_each_index): Handle invariant ADDR_EXPRs inside VIEW_CONVERT_EXPR. * fold-const.c (fold_unary): Fold VIEW_CONVERT_EXPRs from/to pointer type to/from integral types that do not change the precision to regular conversions. * gcc.dg/tree-ssa/ssa-fre-7.c: New testcase. * gcc.dg/tree-ssa/ssa-fre-8.c: Likewise. * gcc.dg/tree-ssa/ssa-fre-9.c: Likewise. * gcc.dg/tree-ssa/ssa-fre-10.c: Likewise. * gcc.dg/tree-ssa/ssa-pre-17.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-10.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-7.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-8.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-9.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-im.c trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-ssa-sccvn.c trunk/gcc/tree-ssa-sccvn.h
Fixed.
> *b = VIEW_CONVERT_EXPR<int>(*a + e); I don't think this is fully fixed for PPC, I will check later today to be sure. -- Pinski
PPC still gets bad code: _f: lfs f0,0(r3) fadds f0,f1,f0 stfs f0,-8(r1) lwz r0,-8(r1) stw r0,0(r4) blr
We do get: *b = VIEW_CONVERT_EXPR<int>(*a + e); Now, if produced: VIEW_CONVERT_EXPR<float>(*b) = *a + e; We might produce better code as SImode is not able to held in FPRs.
Can we fix this at expansion time? Thus, move the VIEW_CONVERT_EXPR from the rhs to the lhs? This might be a target dependent optimization.
(In reply to comment #9) > Can we fix this at expansion time? Thus, move the VIEW_CONVERT_EXPR from the > rhs to the lhs? This might be a target dependent optimization. Yes but then we have to fix fwprop to forward prop SUBREG. I have patches for all of this but I am still getting mixed results due to scheduling differences with the pre ra schedule.
No longer working on this one, I lost the patches when I left sony.
I'm still getting this same bug with PPC and gcc 4.5.1
With 4.6 it should be fixed finally thanks to mem-ref and value-numbering improvements. ;; Function f (f) f (float * a, int * b, float e) { int D.2694; float D.2690; float D.2689; <bb 2>: D.2689_2 = *a_1(D); D.2690_4 = D.2689_2 + e_3(D); D.2694_10 = VIEW_CONVERT_EXPR<int>(D.2690_4); *b_6(D) = D.2694_10; return; } f: .LFB0: .cfi_startproc addss (%rdi), %xmm0 movd %xmm0, (%rsi) ret .cfi_endproc maybe you can verify on ppc.
I downloaded and compiled the 2010-11-6 snapshot of gcc 4.6. I'm still getting the extra load/store in ppc with -O3. .L.f: lfs 0,0(3) fadds 0,1,0 stfs 0,-16(1) lwz 0,-16(1) stw 0,0(4) blr
Weirdly, it works fine with doubles. Testcase: union a { long int i; double f; }; void d(double *a, long int *b, double e) { union a c; c.f = *a + e; *b = c.i; } Results in: .L.d: lfd 0,0(3) fadd 0,1,0 stfd 0,0(4) blr One thing that I noticed while looking at the assembly of my program, it that is was allocated 8 bytes on the stack for each stfs instruction. Does gcc think that stfs writes a 64 bit value which is preventing it writing directly into the space of an int? I've checked the IBM docs, stfs does write a 32 bit value.
(In reply to Scott Mansell from comment #15) > Weirdly, it works fine with doubles. At least at the time I wrote up this bug report, can_change_mode target hook on rs6000 would reject SImode and SFmode IIRC which forces a reload (spill when moving between those two). Now I have not looked at the rs6000 back-end for almost 10 years so I cannot say if this still applies or not. Moving to a target bug.
The reported problem was that with -O3 you get an extra load/store: .L.f: lfs 0,0(3) fadds 0,1,0 stfs 0,-16(1) lwz 0,-16(1) stw 0,0(4) blr I tried the test code using both gcc 9 and 10 (on both powerpc64 BE and LE just in case) and I see this: f: .LFB0: .cfi_startproc lfs 0,0(3) fadds 1,1,0 stfs 1,0(4) blr No extra load/store. FYI: I looked at gcc 5 (the distro version on one of older servers) and it generated way different and probably equally not so good code. I tried on another older system that has gcc 4.8.5 and it did indeed generate the extra load/store as reported.
.