Download Latest Version occucal_150508.tgz (4.6 kB)
Email in envelope

Get an email when there's a new version of bbFMM on GPU

Home / occucal / 131006
Name Modified Size InfoDownloads / Week
Parent folder
README 2013-11-02 2.4 kB
occucal_131006.tgz 2013-10-05 4.1 kB
Totals: 2 Items   6.4 kB 0
* What's this?

This software "occucal" is a C implementation of NVIDIA's "CUDA Occupancy Calculator". Most of codes and data in occucal are referenced from the NVIDIA's original Excel spreadsheet, that is, CUDA_occupancy_calculator.xls.

* Compile

gcc -std=c99 -o occucal main.c -lm

* Run

./occucal (compute capability) (threads per block) (registers per thread) (shared memory per block in bytes)

Here, set 20 when you use sm_20.

* Output 

- STDOUT: "number of active warps per SM", "occupancy of warps per SM [%]", "utilisation of registers per SM [%]" and "utilisation of shared memory per SM [%]". 
- STDERR: The information that the original spreadsheet provides except for the graphs. The message can be suppressed by giving QUIET option. 

* Example

$ ./occucal 20 64 32 2048  <-- Run

# CUDA occupancy calculator in C   <-- STDERR
#(1) Selected your parameters
# Compute Capability: 2.0
# Shared Memory Size Config (bytes): 49152
#(2) Entered your resource usage
# Threads Per Block: 64
# Registers Per Thread: 32
# Shared Memory Per Block (bytes): 2048
#(*) Physical Limits for GPU Compute Capability
# Threads per Warp: 32
# Warps per Multiprocessor: 48
# Threads per Multiprocessor: 1536
# Thread Blocks per Multiprocessor: 8
# Total # of 32-bit registers per Multiprocessor: 32768
# Register allocation unit size: 64
# Register allocation granularity: warp
# Registers per Thread: 63
# Shared Memory per Multiprocessor (bytes): 49152
# Shared Memory Allocation unit size: 128
# Warp allocation granularity: 2
# Maximum Thread Block Size: 1024
#(*) Allocatable thread blocks per multiprocessor
# due to Warps (2 48): 8
# due to Registers (2 32): 16
# due to Shared Memory (2048 49152): 24
#(*) Maximum Thread Blocks Per Multiprocessor
# Limited by Max Warps or Max Blocks per Multiprocessor (8 2): 16
# Limited by Registers per Multiprocessor (16 2): 0
# Limited by Shared Memory per Multiprocessor (24 2): 0
# (3) GPU Occupancy Data is displayed here
# Active Threads per Multiprocessor: 512
# Active Warps per Multiprocessor: 16
# Active Thread Blocks per Multiprocessor: 8
# Occupancy of each Multiprocessor: 33.0%
# (*) Utilization of registers and shared memory per multiprocessor
# Register: 50.0%
# Shared memory: 33.3%

16 33.0 50.0 33.3    <-- STDOUT

* Remark

This software comes with no warranty.

- --
Toru Takahashi
Oct 6, 2013
Source: README, updated 2013-11-02