CUDA Parallel Programming/ProjectDescription
Evaluating the Performance of GPGPUs and Their Use in Scientific Computing
Introduction
CUDA stands for Compute Unified Device Architecture and is a new hardware and software architecture for issuing and managing computations on the GPU as a data-parallel computing device without the need of mapping them to a graphics API. CUDA includes a programming model along with hardware support that simplifies parallel implementation. CUDA is one of the main programming languages that increase the speed of result more than any other languages. Programmers need training in parallel programming to be fully effective in computer science. CUDA forms a platform that contains both high-performance applications for heterogeneous platforms that contain both central and graphics processing units. Data-parallel processing maps data elements to parallel processing threads. Many applications that process large data sets such as arrays can use a data-parallel programming model to speed up the computations. In that case I aimed to use CUDA in order to do a helpful analyze on the medical area (bad-genes). As a first step I search a string under a 1 Mb of a text file under parallel programming. My aim was to observe how parallel programming might increase the performance of the process.
Definition of SNP-Genes
Single nucleotide polymorphisms are DNA sequence variations which occur when a single nucleotide (A, T, C, and G) in the genome sequence is obtained. For example a SNP (bad-gene) may change the DNA sequence TAGGCTAA to TTGGCTAA. For a variation to be considered a SNP, it must occur in at least 1% of the population. The changed gene is represented as bad-gene.
Research Description
Purpose
The purpose of this research project is to illustrate the performance that can be gained by using GPUs in general purpose computing compared to the performance that can be gained by using CPUs.
Problem
The problem I'll be working on to test the hardware is “cluster analysis of gene expressions”. Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense [6]. A gene is a segment of DNA, which contains the formula for the chemical composition of one particular protein. The large majority of abundantly expressed genes are associated with common functions, such as metabolism, and hence are expressed in all cells. However, there will be differences between the expression profiles of different cells, and even in a single cell, expression will vary with time, in a manner dictated by external and internal signals that reflect the state of the organism and the cell itself [7]. A natural basis for organizing gene expression data is to group together genes with similar patterns or expression. For any series of measurements, a number of sensible measures of similarity in the behavior of two genes can be used [8]. This information, then, can be used by the experts in biological sciences to gather further knowledge in the area. This situation makes cluster analysis the best candidate for extracting the information out of gene expressions.
Methodology
For testing purposes, I used three different clustering programs; one is a single threaded program and the other two are programs that use CUDA[9] and OpenCL[10] parallel programming APIs respectively. For the C program, I used Cluster 3.0 [11] software. The CUDA and OpenCL implementations are done by me.
The clustering algorithm used in this project is hierarchical clustering with Euclidean distance[12] as a distance metric and single linkage[13] as a linkage method.
The gene data is gathered from Gene Expression Omnibus Data Set Record 3345 [14]. Then the following data sets with given row count x column count are generated: 4096x16, 8192x16, 16384x16, 4096x32, 8192x32, 16384x32, 4096x64, 8192x64, 16384x64. Each of these sets is given as an input to the three programs.
Evaluation
Evaluation of the work is based on performance metrics used in evaluation of processing units (CPUs and GPUs). Please see Benchmarking Tools section of the wiki for more detailed info.
Results
Results showed us that the program written using CUDA API performed significantly better than OpenCL and Cluster 3.0. The speedup of CUDA compared to OpenCL was between 2 - 8 times, and compared to Cluster 3.0 was between 3 - 20 times. It can be argued that the performance difference between CUDA and OpenCL comes from the fact that OpenCL library is merely a wrapper around CUDA library.
References:
- [0] http://en.wikipedia.org/wiki/Computational_science
- [1] Rauber T., Rünger G., “Exploiting Multiple Levels of Parallelism in Scientific Computing”. IFIP International Federation for Information Processing, 2005, Volume 172/2005, 3-19, DOI: 10.1007/0-387-24049-7_1
- [2] NVIDIA Tesla GPU Computing Technical Brief. Version 1.0.0, 5/24/2007
- [3] Ackermann, J., Baecher, P., Franzel T., Goesele, M., Hamacher, K., “Massively-Parallel Simulation of Biochemical Systems”
- [4] Davis, J., Ozsoy, A., Patel, S., Taufer, M., “Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors”
- [5] Rodríguez, A., Trelles, O., Ujaldón, M., “Using Graphics Processors for a High Performance Normalization of Gene Expressions”
- [6] http://en.wikipedia.org/wiki/Cluster_analysis
- [7] Domany, Eytan. “Cluster Analysis of Gene Expression Data”
- [8] Eisen, M., Spellman, P., Brown, P., Botstein, D., “Cluster Analysis and Display of Genome-Wide Expression Patterns”. PNAS December 8, 1998 vol. 95 no. 25 14863-14868
- [9] http://www.nvidia.com/object/what_is_cuda_new.html
- [10] http://www.khronos.org/opencl/
- [11] http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm
- [12] http://en.wikipedia.org/wiki/Euclidean_distance
- [13] http://en.wikipedia.org/wiki/Single-linkage_clustering/%7Csingle
- [14] http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3345
- [15] http://developer.nvidia.com/object/visual-profiler.html