This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com> wrote:
A similar effect could be obtained by keeping a flag whether we entered the path that stored the value before the transform. Thus, do
lsm = g2; // technically not needed, if you also handle loads flag = false; for (...) { if (g1) { if (flag) g2 = lsm; } else { lsm = 0; flag = true; } } if (flag) g2 = lsm;
int g_1 = 1; int g_2 = 0;
int func_1(void) { int l; for (l = 0; l < 1234; l++) { if (g_1) return l; else g_2 = 0; } return 999; }
new: func_1: movl g_1(%rip), %edx xorl %eax, %eax testl %edx, %edx je .L5 rep ret .L5: movl $0, g_2(%rip) movw $999, %ax ret
func_1: movl g_1(%rip), %edx movl g_2(%rip), %eax testl %edx, %edx je .L5 movl %eax, g_2(%rip) xorl %eax, %eax ret .L5: movl $0, g_2(%rip) movl $999, %eax ret
struct obj { int data; struct obj *next; } *q;
void func() { struct obj *p; for (p = q; p; p = p->next) if (p->data > 0) count++; }
func: .LFB0: movq q(%rip), %rax xorl %ecx, %ecx movl count(%rip), %edx testq %rax, %rax je .L1 .L9: movl (%rax), %esi testl %esi, %esi jle .L3 addl $1, %edx movl $1, %ecx .L3: movq 8(%rax), %rax testq %rax, %rax jne .L9 testb %cl, %cl jne .L12 .L1: rep ret .L12: movl %edx, count(%rip) ret
that would allow to extend this to cover the cases where the access may eventually trap (if you omit the load before the loop).
Not sure which is more efficient (you can eventually combine flag variables for different store locations, combining the if-changed compare is not so easy I guess).
1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2. In the PR, Richard mentions that both FRE/PRE can figure this out, but they are not run after store motion. DOM should also be able to, but apparently does not :(. Tips?
Well. Schedule FRE after loop opts ...
DOM is not prepared to handle loads/stores with differing VOPs - it was never updated really after the alias-improvements merge.
2. The GIMPLE_CONDs I create in this patch are currently causing problems with complex floats/doubles. do_jump is complaining that it can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex values since complex lowering has already happened (??). Is there a more canonical way of creating a GIMPLE_COND that may contain complex floats at this stage?
As you are handling redundant stores you want a bitwise comparison anyway, but yes, complex compares need to be lowered. But as said, you want a bitwise comparison, not a value-comparison. You can try using
Few comments on the patch itself.
+ new_bb = split_edge (ex); + then_bb = create_empty_bb (new_bb); + if (current_loops&& new_bb->loop_father) + add_bb_to_loop (then_bb, new_bb->loop_father);
+ gsi = gsi_start_bb (new_bb); + t1 = make_rename_temp (TREE_TYPE (ref->mem), NULL);
Hmm, why do you need to re-load the value? Don't you have both the value as it was loaded before the loop and the new value (the lsm tmp reg) already? Why not simply remember the SSA name used? Ah, because store motion also uses make_rename_temp. If you don't want to change that (which would be good but inconvenient because of the required PHI insertions), I'd change the code that performs the load in the pre-header to do
SSA_NAME = ref->mem; rename-lsm-tmp = SSA_NAME;
so you can remember SSA_NAME and re-use it here. That probably solves your optimization issue, too.
+ for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple phi = gsi_stmt (gsi); + unsigned i; + + for (i = 0; i< gimple_phi_num_args (phi); i++) + if (gimple_phi_arg_edge (phi, i)->src == new_bb) + { + tree arg = gimple_phi_arg_def (phi, i); + add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION); + update_stmt (phi); + } + }
Hmm. This is of course doing unnecessary work in case there are multiple moved stores. In fact splitting the exit edge is only necessary if there are more than one predecessor. Otherwise you can simply split the exit block after inserting the new conditional after its labels.
+ if (opts->x_flag_tm) + { + if (opts->x_flag_non_call_exceptions) + sorry ("transactional memory is not supported with non-call exceptions"); + + set_param_value ("allow-load-data-races", 0, + opts->x_param_values, opts_set->x_param_values); + set_param_value ("allow-store-data-races", 0, + opts->x_param_values, opts_set->x_param_values);
Unless the user explicitely set those params? Which means using params wasn't a very good idea ... ideally you'd diagnose an error when the user would mix -fgnu-tm and -fallow-store-data-races. So - can we remove allow-load-data-races (we are never going to implement that) and make allow-store-data-races a proper -f flag please?
+ }
And that should be a separate patch
It would be nice if you could think about LIM/SM for possibly trapping loads/stores while you are doing this work.
Thanks again. Aldy
Attachment:
curr
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |