If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2015-04-21
There is a layer with GPU support implemented in src/depends/GCuda. This folder is not built by default because it depends on three NVIDIA libraries that are included in the CUDA SDK: curand, cublas, and cudart. To build it, go into src/depends/GCuda and do "sudo make install". (Now that I think of it, I made some recent changes to the GNeuralNetLayer class, and I have not yet made corresponding changes in the GCuda folder, so it may not compile right now. I will try to fix that right away.)
There is a demo in waffles/demos/cuda that measures the gains from parallelization. Last time I tried it, I measured about 20x speed-up on a Tesla K40. That's not as good as some others have reported, but my code is much easier to read. Unfortunately, my convolutional layers do not yet support GPU parallelization, but I am planning to add that as soon as I find some time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is a layer with GPU support implemented in src/depends/GCuda. This folder is not built by default because it depends on three NVIDIA libraries that are included in the CUDA SDK: curand, cublas, and cudart. To build it, go into src/depends/GCuda and do "sudo make install". (Now that I think of it, I made some recent changes to the GNeuralNetLayer class, and I have not yet made corresponding changes in the GCuda folder, so it may not compile right now. I will try to fix that right away.)
There is a demo in waffles/demos/cuda that measures the gains from parallelization. Last time I tried it, I measured about 20x speed-up on a Tesla K40. That's not as good as some others have reported, but my code is much easier to read. Unfortunately, my convolutional layers do not yet support GPU parallelization, but I am planning to add that as soon as I find some time.
(ok, it's up-to-date now.)