[gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

Wed Dec 2 13:02:00 GMT 2015

On 12/02/15 05:40, Jakub Jelinek wrote:
>  Don't know the HW good enough, is there any power consumption, heat etc.
> difference between the two approaches?  I mean does the HW consume different
> amount of power if only one thread in a warp executes code and the other
> threads in the same warp just jump around it, vs. having all threads busy?

Having all threads busy will increase power consumption.  It's also bad if the 
other vectors are executing memory access instructions.  However, for small 
blocks, it is probably a win over the jump around approach.  One of the 
optimizations for the future of the neutering algorithm is to add such 
predication for small blocks and keep branching for the larger blocks.

> How exactly does OpenACC copy the stack?  At least for OpenMP, one could
> have automatic vars whose addresses are passed to simd regions in different
> functions, say like:

The stack frame of the current function is copied when entering a partitioned 
region.  (There is no visibility of caller's frame and such.) Again, 
optimization would be trying to only copy the stack that's used in the 
partitioned region.

nathan