Menu

cuda-z only runs tests on first gpu

2013-02-25
2013-03-02
1 2 > >> (Page 1 of 2)
  • Jason Keltz

    Jason Keltz - 2013-02-25

    Hi.
    When I run CUDA-Z, it is able to detect all 4 GPUs in my system, but only able to run tests on the first GPU. Since it seems to get information on the other GPUs, and since I can run bandwidthTest on all the GPUs, I suspect that it should work. Any ideas?
    I'm running the latest Linux x64 nvidia driver, 310.32, CentOS 6.3.
    I'm running CUDA 5.0.35.
    Any assistance would be great..

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-25

    Thanks for report!

    Can you see other GPUs in a list box on a bottom of the CUDA-Z window?

    If you select other (non-first) GPU in this list, how does it react? Does it show info about the selected GPU? Does it run tests? What logging in console says? CUDA-Z drops a lot of logging there.

    What version of CUDA-Z do you run? Check latest beta from today.

     
  • Jason Keltz

    Jason Keltz - 2013-02-25

    Yep.. in the listbox, I see all the GPUs.
    CUDA-Z prints the details for the other cards for Core and Memory, but when you go to the performance tab for them, you just see "--" for all the numbers for anything but the first card.
    I'm trying the CUDA-Z from today, and tried the earlier one too with the same result.
    I'm sure it's a simple bug. Obviously CUDA-Z can see the cards -- just not sure why it's not getting the performance numbers.

     
  • Jason Keltz

    Jason Keltz - 2013-02-25

    Sorry.. checked the console log and I do see some errors as follows...

    [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
    [Hardware Error]: APEI generic hardware error status
    [Hardware Error]: severity: 2, corrected
    [Hardware Error]: section: 0, severity: 2, corrected
    [Hardware Error]: flags: 0x01
    [Hardware Error]: section_type: PCIe error
    [Hardware Error]: port_type: 4, root port
    [Hardware Error]: version: 1.16
    [Hardware Error]: command: 0x0010, status: 0x0547
    [Hardware Error]: device_id: 0000:00:02.0
    [Hardware Error]: slot: 0
    [Hardware Error]: secondary_bus: 0x02
    [Hardware Error]: vendor_id: 0x8086, device_id: 0x3c04
    [Hardware Error]: class_code: 000406
    [Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003
    [Hardware Error]: aer_status: 0x00000000, aer_mask: 0x00002000
    [Hardware Error]: aer_layer=Transaction Layer, aer_agent=Receiver ID
    NVRM: GPU at 0000:02:00: GPU-0e71757b-9243-6f48-2742-8179496beb14
    [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
    [Hardware Error]: APEI generic hardware error status
    [Hardware Error]: severity: 2, corrected
    [Hardware Error]: section: 0, severity: 2, corrected
    [Hardware Error]: flags: 0x01
    [Hardware Error]: section_type: PCIe error
    [Hardware Error]: port_type: 4, root port
    [Hardware Error]: version: 1.16
    [Hardware Error]: command: 0x0010, status: 0x0547
    [Hardware Error]: device_id: 0000:00:03.0
    [Hardware Error]: slot: 0
    [Hardware Error]: secondary_bus: 0x03
    [Hardware Error]: vendor_id: 0x8086, device_id: 0x3c08
    [Hardware Error]: class_code: 000406
    [Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003
    [Hardware Error]: aer_status: 0x00000000, aer_mask: 0x00002000
    [Hardware Error]: aer_layer=Transaction Layer, aer_agent=Receiver ID
    NVRM: GPU at 0000:03:00: GPU-57f66cc2-a378-f0b9-88a8-e67c2d4251f3
    NVRM: GPU at 0000:83:00: GPU-e7104aa1-bb8b-90ae-eb7c-2f19139812c8
    NVRM: GPU at 0000:84:00: GPU-feeeedcb-5f3b-0fa7-96db-d1b2e33799f7
    __ratelimit: 2 callbacks suppressed
    [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
    [Hardware Error]: APEI generic hardware error status
    [Hardware Error]: severity: 2, corrected
    [Hardware Error]: section: 0, severity: 2, corrected
    [Hardware Error]: flags: 0x01
    [Hardware Error]: section_type: PCIe error
    [Hardware Error]: port_type: 4, root port
    [Hardware Error]: version: 1.16
    [Hardware Error]: command: 0x0010, status: 0x0547
    [Hardware Error]: device_id: 0000:00:02.0
    [Hardware Error]: slot: 0
    [Hardware Error]: secondary_bus: 0x02
    [Hardware Error]: vendor_id: 0x8086, device_id: 0x3c04
    [Hardware Error]: class_code: 000406
    [Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003
    [Hardware Error]: aer_status: 0x00000000, aer_mask: 0x00002000
    [Hardware Error]: aer_layer=Transaction Layer, aer_agent=Receiver ID
    [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
    [Hardware Error]: APEI generic hardware error status
    [Hardware Error]: severity: 2, corrected
    [Hardware Error]: section: 0, severity: 2, corrected
    [Hardware Error]: flags: 0x01
    [Hardware Error]: section_type: PCIe error
    [Hardware Error]: port_type: 4, root port
    [Hardware Error]: version: 1.16
    [Hardware Error]: command: 0x0010, status: 0x0547
    [Hardware Error]: device_id: 0000:80:03.0
    [Hardware Error]: slot: 0
    [Hardware Error]: secondary_bus: 0x84
    [Hardware Error]: vendor_id: 0x8086, device_id: 0x3c08
    [Hardware Error]: class_code: 000406
    [Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003
    [Hardware Error]: aer_status: 0x00000000, aer_mask: 0x00002000
    [Hardware Error]: aer_layer=Transaction Layer, aer_agent=Receiver ID

    but it happened once, and I don't see it happening again either...

     
  • Jason Keltz

    Jason Keltz - 2013-02-25

    By the way, I did a "dmesg -c" as root to clear console log, then ran CUDA-Z again, and now in dmesg I only see:

    NVRM: GPU at 0000:02:00: GPU-0e71757b-9243-6f48-2742-8179496beb14
    NVRM: GPU at 0000:03:00: GPU-57f66cc2-a378-f0b9-88a8-e67c2d4251f3
    NVRM: GPU at 0000:83:00: GPU-e7104aa1-bb8b-90ae-eb7c-2f19139812c8
    NVRM: GPU at 0000:84:00: GPU-feeeedcb-5f3b-0fa7-96db-d1b2e33799f7

    ... but same result.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-25

    Hmmm... Interesting.

    What GPUs are used by you in this system? I need to know how I can reproduce similar to yours conditions.

     
  • Jason Keltz

    Jason Keltz - 2013-02-25

    Actually, I have 4 x GTX 670 in the system, but earlier today, I tried one GTX 580 along with 3 GTX 670, and the results were the same. What I can do is give you access to the system if you're interested. Email me at jas at cse.yorku.ca, and we will set something up.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-25

    Yo! Really heavy config! ;) I only can guess what you need this monster for.

    So, I've made a small test at home with
    0:GTX 285 (yes i still sit one this one)
    1:8400GS (some backup card)
    on Windows, Linux and MacOSX.

    And the second card shows performance values, as first as well. And it shows some realistic values, not the same numbers two times.

    From attached file:
    Version: 0.7.175 SVN Built Feb 25 2013 10:04:59 http://cuda-z.sf.net/
    OS Version: Linux 3.5.0-25-generic #38-Ubuntu SMP Mon Feb 18 23:27:42 UTC 2013 x86_64
    Driver Version: 310.14
    Driver Dll Version: 5.0 (310.14)
    Runtime Dll Version: 4.20 (4)

    I have Ubuntu 12.10 x64 desktop OS.

    What I can conclude:
    1. double card configuration works well: there is no failure in algorithm here.
    2. my configuration is not symmetric: values are different and it proves the algorithm is ok.
    3. my configuration is not so heavy as yours: it can be that we have problems here.
    3.a. we have bug in CUDA-Z which shows only on heavy configurations.
    3.b. there are problems in nvidia driver that prevents from working it well.
    3.c. there are problems in linux kernel that prevents from working it well.
    3.d. etc..

    What I can suggest now:
    1. strip your monster to one card configuration and test CUDA-Z working well.
    2. enhance your monster with double card configuration and test CUDA-Z working well.
    3. ... thee ...
    4. ... four ...

    Try to find out when it stops working. I tested CUDA-Z myself only with three card config - I simply have only three PCIe slots in my PC. It might be that four card config is affected by black magic of a full moon today, but I don't really believe in this. It most probably must work for four cards too.

    If you can catch CUDA-Z logging from console, drop it here with export information of your four cards configuration.

    WBR,
    AG

    PS: Your logging from before with [Hardware Error] looks unrelated at first glance but...

    0x8086:0x3c04:Intel Corporation:Intel(R) Xeon(R) Processor E5 Product Family/Core i7 IIO PCI Express Root Port 2a - 3C04

    ... it can be related somehow. Bad contacts, dust in case, full moon, etc.

     
  • Jason Keltz

    Jason Keltz - 2013-02-25

    Hi Andriy,

    Actually, the machine is a GPU compute machine for a faculty member at our university. We're having some problems with the performance of the cards in the system, which is why I'm trying to use CUDA-Z to help diagnose. In particular..

    GTX 580 on old core i7 system from 2009: Host Pageable to Device: 3993.62 MiB/s
    GTX670 on new system: Host Pageable to Device: 2615.68 MiB/s
    GTX670 on old core i7 system from 2009: Host Pageable to Device: 4253.14 MiB/s

    The new system has 64 GB of DDR3 1600 Mhz memory with dual Xeon E5 processors.. so should be fast! The CPUs aren't quite the same, but I don't think this result is due to CPU.

    It's a lot of work to get rid of the cards from the system because the system is rack mounted, and I have to take it down to get into it. If I gave you access to the system, I wonder if you might be able to insert debugging code into cuda-z to help with the diagnostic...

    Jason.

     
  • Jason Keltz

    Jason Keltz - 2013-02-25

    Interesting to note that GPU 0, 1, and 3 are not recognized by performance, but only 2.
    Here's what I see on the console for the ones not working:

    Waiting for new loop...
    Timer shot -> update performance for device 3 in mode 0
    Rising update action for device 3
    Thread loop started
    Alloc local buffers for GeForce GTX 670.
    Alloc host pageable for GeForce GTX 670.
    Host pageable is at 0xF28FF008.
    Alloc host pinned for GeForce GTX 670.
    Waiting for new loop...
    Timer shot -> update performance for device 3 in mode 0
    Rising update action for device 3
    Thread loop started
    Alloc local buffers for GeForce GTX 670.
    Alloc host pageable for GeForce GTX 670.
    Host pageable is at 0xF28FF008.
    Alloc host pinned for GeForce GTX 670.
    Waiting for new loop...

    I included the log file from the one not working.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-25

    From the logging you put here it looks like only one card can get a proper host pinned buffer - not enough kernel RAM ???.

    This is a normal start up initialization:

    Thread started
    Selecting GeForce GTX 285.
    Alloc local buffers for GeForce GTX 285.
    Alloc host pageable for GeForce GTX 285.
    Host pageable is at 0xF4AFF008.
    Alloc host pinned for GeForce GTX 285. <<-- after this your logging brakes.
    Rising update action for device -1
    Host pinned is at 0xF25B5000.
    Alloc device buffer 1 for GeForce GTX 285.
    Device buffer 1 is at 0x00210000.
    Alloc device buffer 2 for GeForce GTX 285.
    Device buffer 2 is at 0x01210000.
    Waiting for new loop...

    This point somehow to problem 3.b. and 3.c.

    To help you with debugging is a good opportunity to me, bit I can't use it now because it's a midnight at my location (Europe)... :(

    I do really believe it's a HW issue. You'll have to open this PC and try card one by one I believe... At least It's what I would do in such a case.

    See you!

     
  • Jason Keltz

    Jason Keltz - 2013-02-26

    Thread loop started
    Alloc local buffers for GeForce GTX 670.
    Alloc host pageable for GeForce GTX 670.
    Host pageable is at 0xF2DFD008.
    Alloc host pinned for GeForce GTX 670.
    Waiting for new loop...
    Timer shot -> update performance for device 1 in mode 0
    Rising update action for device 1
    Thread loop started
    Alloc local buffers for GeForce GTX 670.
    Alloc host pageable for GeForce GTX 670.
    Host pageable is at 0xF2DFD008.
    Alloc host pinned for GeForce GTX 670.
    Waiting for new loop...

    So...

    With 2 cards in the system, same problem as before.
    It's not the specific card because I had the same problem when I had one GTX580 and one GTX670 in the system as well.
    It's not the kernel or the CUDA version because I have the same kernel running on another board (different motherboard with Corei7) and it has a 580 and a Tesla C2050 and it recognizes both. In fact, I swapped the 580 from there with a 670, and it works there as well.
    The issue is probably a motherboard issue, but the motherboard passes Intels full test suite successfully.
    I'll bet it's a hardware bug that needs to be fixed, but I have no idea how to present this problem in a way that it they can identify it, and correct it.

    sigh.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-26

    Did you try to play with BIOS settings?
    If my assumption is correct, there is no enough paged memory available even for 2 cards (and "heavy" cards), and you may try to tweak this in BIOS setting.

    Kernel may have another parameters/characteristics on another PC. If your HW is too new for your current kernel, the kernel may use some kind of "safe" settings which is not really performance-optimized.

    Do you have problems with other CUDA programs or only with CUDA-Z? And if yes, what problems do you have with other CUDA SW?

     
  • Jason Keltz

    Jason Keltz - 2013-02-26

    The MB is a server motherboard (W2600CR) that supports the 4 x16 slots (W2600CR), so I suspect it shouldn't be a problem..
    I couldn't find anything in the BIOS to adjust though...
    We're finding that the cards all work from CUDA okay - there is an issue with SPEED, but not an issue with the ability to access the devices.

    Yet, bandwidthTest from CUDA works on all 4 cards... here's the results from the 580/670 from both systems..

    GTX580
    PINNED Memory Transfers
    Transfer (Bytes) Bandwidth on OLD system (MB/s) Bandwidth on NEW system (MB/s)
    1024 234.2 186.5
    67186688 5716.4 5746.8

    Device to Host Bandwidth
    Transfer (Bytes) Bandwidth on OLD system (MB/S) Bandwidth on NEW system (MB/s)
    1024 303.1 289.8
    67186688 6097.6 6323.6

    Device to Device Bandwidth
    Transfer (Bytes) Bandwidth on OLD sytsem (MB/s) Bandwidth on NEW system (MB/s)
    1024 402.9 269.8
    67186688 169949.5 169761.5


    GTX670
    PINNED Memory Transfers
    Transfer (Bytes) Bandwidth on OLD system (MB/s) Bandwidth on NEW system (MB/s)
    1024 134.7 118.6
    67186688 5902.0 5981.4

    Device to Host Bandwidth
    Transfer (Bytes) Bandwidth on OLD system (MB/S) Bandwidth on NEW system (MB/s)
    1024 190.9 193.0
    67186688 6095.2 6380.1

    Device to Device Bandwidth
    Transfer (Bytes) Bandwidth on OLD sytsem (MB/s) Bandwidth on NEW system (MB/s)
    1024 622.8 599.6
    67186688 151662.7 151589.2

    If I went purely on the data above, everything "sort of" seems okay given that you get the number for all the GTX670, but CUDA-Z gave this result for host pageable to device.. (note -- pinned tests above work which is what CUDA-Z is failing on!)

    GTX580 on old system: Host Pageable to Device: 3993.62 MiB/s
    GTX580 on new system: Host Pageable to Device: 2500 MiB/s
    GTX670 on old system: Host Pageable to Device: 4253.14 MiB/s
    GTX670 on new system: Host Pageable to Device: 2615.68 MiB/s

    CUDA-Z result was no different for this test with only 1 card installed either.

    Now, the old system has a core i7-950 3.06 Ghz, and the new one is E5-2620 2.00 Ghz... but not sure how much of this would be dependent on slower clock of the processor...

     
  • Jason Keltz

    Jason Keltz - 2013-02-26

    By the way, for the record, bandwidthTest --memory=pageable (which I didn't realize existed), works for all the cards as well.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-26

    It seams like there is a problem in CUDA-Z too that show up in such a corner case as yours.
    I'll try to analyze the code and may be activate some extra logging too. Let's see how can I help here.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-26

    You can try: bandwidthTest --memory=pinned -wc

     
  • Jason Keltz

    Jason Keltz - 2013-02-26

    Running on...

    Device 0: GeForce GTX 670
    Quick Mode

    Host to Device Bandwidth, 1 Device(s)
    PINNED Memory Transfers
    Write-Combined Memory Writes are Enabled Transfer Size (Bytes) Bandwidth(MB/s)
    33554432 5971.0

    Device to Host Bandwidth, 1 Device(s)
    PINNED Memory Transfers
    Write-Combined Memory Writes are Enabled Transfer Size (Bytes) Bandwidth(MB/s)
    33554432 6372.7

    Device to Device Bandwidth, 1 Device(s)
    PINNED Memory Transfers
    Write-Combined Memory Writes are Enabled Transfer Size (Bytes) Bandwidth(MB/s)
    33554432 151089.3

    ... works for --device 0 through --device 3.

    jas.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-26

    I have prepared a special version of CUDA-Z with a lot more logging then usual. At least this version will show all errors.

    http://sourceforge.net/projects/cuda-z/files/cuda-z/0.7-Beta/CUDA-Z-0.7.176-SVN-logging.run/download

    Please send me the logging while selecting all cards one after another.

    sh ..../CUDA-Z-0.7.176-SVN-logging.run 2> logfile.txt

    Thanks!

     
  • Jason Keltz

    Jason Keltz - 2013-02-27

    Someone is running a job on the server... when it's done, I'll give it a try...

    Thanks!

     
  • Jason Keltz

    Jason Keltz - 2013-02-27

    I ran it with 2 cards in the system (back to 2 at the moment)..

    card 0 works of course...
    then when switching to card 1, I only see:

    Waiting for new loop...
    Switch device -> update performance for device 1
    Rising update action for device 1
    Thread loop started
    Alloc local buffers for GeForce GTX 670.
    Alloc host pageable for GeForce GTX 670.
    Host pageable is at 0xF0CFF008.
    Alloc host pinned for GeForce GTX 670.
    CUDA Error: all CUDA-capable devices are busy or unavailable
    Waiting for new loop...
    Timer shot -> update performance for device 1 in mode 0
    Rising update action for device 1
    Thread loop started
    Alloc local buffers for GeForce GTX 670.
    Alloc host pageable for GeForce GTX 670.
    Host pageable is at 0xF0CFF008.
    Alloc host pinned for GeForce GTX 670.
    CUDA Error: all CUDA-capable devices are busy or unavailable

    By the way, it was just identified that while the W2600CR motherboard has 4 x16 slots, only 1 of them is x16 electrical.. the others are x8. I figured that was the reason for the performance issue with paged memory, but when I put only one 670 in the system, and put it into the x16 slot, the numbers are the same.. argh.

    jas.

     
  • Andriy Golovnya

    Andriy Golovnya - 2013-02-27

    Yeah.. Disappointing. Newer believe in PCIe x16 for second or third slot, fourth slot.
    I've got a similar issue on my GA EX58-UD5 - second slot works at x16 if third is not populated, otherwise x8 both. Sad but true. Funniest is the fact that if second slot is not populated, third is dead.
    Similar joke is implemented on GA 870A-UD3 - if second x16 (electrically x4) slot is populated with x4 card, two extra x1 ports are disabled. All three of them could work only when second x4 is populated with x1 card.
    I'm really surprised to see server motherboards have the same issue.

    Regarding CUDA-Z output, I'll try to find out what could be a reason. I presume in quad-card configuration three cards give the same error message.

    Thanks!

    WBR,
    AG

     
  • Jason Keltz

    Jason Keltz - 2013-02-28

    Well, with this board, you do get 1 x16 and 3 x8 according to the technical spec.. It's not clear for CUDA how much x8 affects the overall performance of the card. But with only 1 card in the system at the moment in the x16 slot (slot 1), I still don't understand the performance differences in paged memory transfers... I don't think I ever will. argh.

     
  • Jason Keltz

    Jason Keltz - 2013-03-01

    It turns out the problem re: reduced paged memory performance had to do with having a second CPU in the system. I removed the second CPU, and performance is back to normal! Imagine that!

    I had actually removed all but 1 GTX 670, ensured it was in SLOT 1 which is managed by CPU 1, used taskset to ensure that the bandwidthTest was running on CPU 1... didn't work... only removing the second CPU. I wish someone could explain why it works that way.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.