Scheduling problem - A more detailed explain

I'm working on IA64 with GCC-4.1.1; what I do is to instrument some
sensitive instructions (e.g. memory access) to do information flow
tracking.  As I insert the instrumentation after register allocation,
I need to allocate general registers and predicates myself;  for
corner cases in allocation, for example, once I can not find enough
general registers, I will spill some of them into memory and do the
allocation again; for predicate registers, once I cannot find enough
predicates to do instrumentation, I allocate  one more general
register, say X, and use the following  instruction:
	mov X = pr
on IA64 to save all predicates; of course, I need to restore all
predicates at the end of instrumentation, and I use the following
	mov pr = X, -1
Thus a code sequence for this type of instrumentation may be:
	mov X = pr
	instrumentation prologue
	sensitive instruction
	instrumentation epilogue
	mov pr = X, -1
However, I found GCC may not correctly handle  data dependencies with
mov X=pr or mov pr = X, -1 instructions.
Consider the following instruction sequence:
	mov r8=pr
	cmp.eq p6,p7= …
	(p6) ...
	(p7) ...
	mov pr=r8,-1

Obviously, there are some data dependencies there, for example, we can
not execute
	mov pr = r8, -1
until we finish executing instructions using p6 and p7 predicate registers.
BUT, after the second scheduling, GCC generates code in the following sequence:
	mov r8=pr
	cmp.eq p6,p7=something0
	(p6) something1
	mov pr=r8,-1
	(p7) something2

This code is absolutely wrong as p7 may hold value with nonsense !

I think the cause of the problem is that GCC doesn't handle  data
dependencies between mov X = pr (or mov pr = X, -1) and other usage of
a specific predicate register (e.g. p6, p7); as GCC only emits mov X =
pr (or mov pr = X, -1) instruction at  the function prologue and
epilogue,  this problem doesn't exist.

So, my question becomes clear:
How to solve this problem by making GCC knows the data dependencies
between mov X = pr (or mov pr = X, -1) and other usage of a specific
predicate register (e.g. p6, p7)?

Personally speaking,  I think I need to modify or to instruct GCC to notice these dependencies (However,
these files look too much complex :-(); or is there any simpler way to
get around this problem ?

Any help is truly appreciated
Thanks !

