[gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant
Nathan Sidwell
nathan@acm.org
Wed Dec 2 13:02:00 GMT 2015
On 12/02/15 05:40, Jakub Jelinek wrote:
> Don't know the HW good enough, is there any power consumption, heat etc.
> difference between the two approaches? I mean does the HW consume different
> amount of power if only one thread in a warp executes code and the other
> threads in the same warp just jump around it, vs. having all threads busy?
Having all threads busy will increase power consumption. It's also bad if the
other vectors are executing memory access instructions. However, for small
blocks, it is probably a win over the jump around approach. One of the
optimizations for the future of the neutering algorithm is to add such
predication for small blocks and keep branching for the larger blocks.
> How exactly does OpenACC copy the stack? At least for OpenMP, one could
> have automatic vars whose addresses are passed to simd regions in different
> functions, say like:
The stack frame of the current function is copied when entering a partitioned
region. (There is no visibility of caller's frame and such.) Again,
optimization would be trying to only copy the stack that's used in the
partitioned region.
nathan
More information about the Gcc-patches
mailing list