[PATCH 0/8] NVPTX offloading to NVPTX: backend patches

Bernd Schmidt bschmidt@redhat.com
Tue Oct 18 11:03:00 GMT 2016

On 10/17/2016 07:06 PM, Alexander Monakov wrote:

> I've just pushed two commits to the branch to fix this issue.  Before those, the
> last commit left the branch in a state where an incremental build seemed ok
> (because libgcc/libgomp weren't rebuilt with the new cc1), but a from-scratch
> build was broken like you've shown.  LULESH is known to work.  I also intend to
> perform a trunk merge soon.

Ok that did work, however...

>> I think before merging this work we'll need to have some idea of how well it
>> works on real-world code.
> This patchset and the branch lay the foundation, there's more work to be
> done, in particular on the performance improvements side. There should be
> an agreement on these fundamental bits first, before moving on to fine-tuning.

The performance I saw was lower by a factor of 80 or so compared to 
their CUDA version, and even lower than OpenMP on the host. Does this 
match what you are seeing? Do you have a clear plan how this can be 

To me this kind of performance doesn't look like something that will be 
fixed by fine-tuning; it leaves me undecided whether the chosen approach 
(what you call the fundamentals) is viable at all. Performance is still 
better than the OpenACC version of the benchmark, but then I think we 
shouldn't repeat the mistakes we made with OpenACC and avoid merging 
something until we're sure it's ready and of benefit to users.


More information about the Gcc-patches mailing list