CUDA Programming/DeviceQuery

From CS486wiki
Revision as of 04:42, 21 May 2011 by Myuksek1 (talk | contribs)
(change visibility) (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

← Back to project main page

Both CUDA API allows us to gather certain information such as the driver version, available devices, very detailed device properties like total available memory, bandwidth, computing capabilities etc.

Implementation

First of all, for the simplicity of the example, we did not include any error detection & correction and ignored error values retuned by functions.

We need to know the number of CUDA-capable devices on the system to begin with:

cudaError_t cudaGetDeviceCount(int *count)
 Stores the number of devices with compute capability greater than or equal to 1.0, 
 and are avabilable for execution, in count.

Devices are enumerated, starting from 0, incrementally. Then for each device we need to query their properties. Properties are encapsulated in the following structure:

 struct cudaDeviceProp
 {
   char name[256];
     Identifies the device
   size_t totalGlobalMem;
     Total amount of global memory available on the device in bytes
   size_t sharedMemPerBlock
     Maximum amount of shared memory available to a thread block in bytes
   int regsPerBlock;
     Maximum number of 32-bit registers available to a thread block 
   int warpsize;
     Warp size in threads
   size_t memPitch;
     Maximum pitch in bytes allowed by the memory copy functions that involve
     memory regions allocated through cudaMallocPitch() call
   int maxThreadsPerBlock;
     Maximum number of threads per block
   int maxThreadsDim[3];
     Maximum size of each dimension of a block
   int maxGridSize[3];
     Maximum size of each dimension of a grid
   size_t totalConstMem;
     Total amount of constant memory available on the device in bytes
   int major;
     Represents the major revision number of the device's compute capability
   int minor;
     Represents the minor revision number of the device's compute capability
   int clockRate;
     Clock frequency in kilohertz
   size_t textureAlignment;
     Defines the alignment requirement; texture base addresses that are aligned 
     to textureAlignment number of bytes do not need an offset applied to texture fetches
   int deviceOverlap;
     Specifies whether the device can concurrently copy memory between host and device while executing a kernel
     Return value of one indicates that the device supports device overlap
   int multiProcessorCount;
     Number of multiprocessors on the device
   int kernelExecTimeoutEnabled;
     Specifies if there is a run time limit for kernels on the device
     Return value of 1 has positive indication
   int integrated;
     Specifies if the device is an integrated (motherboard) GPU or a discrete (card) component
     Return value of 0 represents discrete, value of 1 represents integrated option
   int canMapHostMemory;
     Specifies whether the device can map host memory into the CUDA address space
     Return value of 1 has positive indication
   int computeMode;
     Specifies the compute mode that the device is currently in.
     Return value of cudaComputeModeDefault means that multiple threads can call cudaSetDevice()
     Return value of cudaComputeModeExclusive means that only one thread can call cudaSetDevice()
     Return value of cudaComputeModeProhibited means that no threads can call cudaSetDevice()
   int concurrentKernels;
     Specifies whether the device supports executing multiple kernels within the same context simultaneously or not
     Return value of 1 has positive indication
   int ECCEnabled;
     Specifies whether the device has ECC support or not
     Return value of 1 has positive indication
   int pciBusID;
     PCI bus identifier of the device
   int pciDeviceID;
     PCI device, or slot, identifier of the device
   int tccDriver;
     Specifies if the driver is using a TCC driver or not
     Return value of 1 has positive indication
 }

We can obtain the device properties using the following call:

 cudaError_t cudaGetDeviceProperties(struct cudaDeviceProp *prop, int device)
 Stores the properties of the device, whose id is device, in argument prop

So, with the assumption of a function with the given definition:

 void printDeviceProperties(const struct cudaDeviceProp *prop)
 Prints out the device properties stored in prop

the main section of the program looks like this:

 struct cudaDeviceProp **cudaDevices;
 int count, i;
 
 cudaGetDeviceCount(&count);
 cudaDevices = (struct cudaDeviceProp **)malloc( sizeof(struct cudaDeviceProp *) * count );
 
 for (i = 0; i < count; ++i)
 {
   cudaDevices[i] = (struct cudaDeviceProp *)malloc(sizeof(struct cudaDeviceProp));
   cudaGetDeviceProperties(cudaDevices[i], i);
   printDeviceProperties(cudaDevices[i]);
   free(cudaDevices[i]);
 }
 free(cudaDevices);