CUDA Parallel Programming/ProjectDescription: Difference between revisions

From CS486wiki
Jump to navigationJump to search
Content deleted Content added
Core (talk | contribs)
No edit summary   (change visibility)
Core (talk | contribs)
No edit summary   (change visibility)
Line 6: Line 6:
<p> CUDA stands for Compute Unified Device Architecture and is a new hardware and software architecture for issuing and managing computations on the GPU as a data-parallel computing device without the need of mapping them to a graphics API. CUDA includes a programming model along with hardware support that simplifies parallel implementation. CUDA is one of the main programming languages that increase the speed of result more than any other languages. Programmers need training in parallel programming to be fully effective in computer science. CUDA forms a platform that contains both high-performance applications for heterogeneous platforms that contain both central and graphics processing units. Data-parallel processing maps data elements to parallel processing threads. Many applications that process large data sets such as arrays can use a data-parallel programming model to speed up the computations. In that case I aimed to use CUDA in order to do a helpful analyze on the medical area (bad-genes). As a first step I search a string under a 1 Mb of a text file under parallel programming. My aim was to observe how parallel programming might increase the performance of the process.</p>
<p> CUDA stands for Compute Unified Device Architecture and is a new hardware and software architecture for issuing and managing computations on the GPU as a data-parallel computing device without the need of mapping them to a graphics API. CUDA includes a programming model along with hardware support that simplifies parallel implementation. CUDA is one of the main programming languages that increase the speed of result more than any other languages. Programmers need training in parallel programming to be fully effective in computer science. CUDA forms a platform that contains both high-performance applications for heterogeneous platforms that contain both central and graphics processing units. Data-parallel processing maps data elements to parallel processing threads. Many applications that process large data sets such as arrays can use a data-parallel programming model to speed up the computations. In that case I aimed to use CUDA in order to do a helpful analyze on the medical area (bad-genes). As a first step I search a string under a 1 Mb of a text file under parallel programming. My aim was to observe how parallel programming might increase the performance of the process.</p>
<hr>
<hr>
== Definition of SNP-Genes ==
Single nucleotide polymorphisms are DNA sequence variations which occur when a single nucleotide (A, T, C, and G) in the genome sequence is obtained. For example a SNP (bad-gene) may change the DNA sequence TAGGCTAA to TTGGCTAA. For a variation to be considered a SNP, it must occur in at least 1% of the population. The changed gene is represented as bad-gene.

= Research Description =
= Research Description =
== Purpose ==
== Purpose ==

Revision as of 05:38, 12 May 2012

← Back to project main page

Evaluating the Performance of GPGPUs and Their Use in Scientific Computing

Introduction

CUDA stands for Compute Unified Device Architecture and is a new hardware and software architecture for issuing and managing computations on the GPU as a data-parallel computing device without the need of mapping them to a graphics API. CUDA includes a programming model along with hardware support that simplifies parallel implementation. CUDA is one of the main programming languages that increase the speed of result more than any other languages. Programmers need training in parallel programming to be fully effective in computer science. CUDA forms a platform that contains both high-performance applications for heterogeneous platforms that contain both central and graphics processing units. Data-parallel processing maps data elements to parallel processing threads. Many applications that process large data sets such as arrays can use a data-parallel programming model to speed up the computations. In that case I aimed to use CUDA in order to do a helpful analyze on the medical area (bad-genes). As a first step I search a string under a 1 Mb of a text file under parallel programming. My aim was to observe how parallel programming might increase the performance of the process.


Definition of SNP-Genes

Single nucleotide polymorphisms are DNA sequence variations which occur when a single nucleotide (A, T, C, and G) in the genome sequence is obtained. For example a SNP (bad-gene) may change the DNA sequence TAGGCTAA to TTGGCTAA. For a variation to be considered a SNP, it must occur in at least 1% of the population. The changed gene is represented as bad-gene.

Research Description

Purpose

The purpose of this research project is to illustrate the performance that can be gained by using GPUs in general purpose computing compared to the performance that can be gained by using CPUs.

Problem

The problem I'll be working on to test the hardware is “cluster analysis of gene expressions”. Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense [6]. A gene is a segment of DNA, which contains the formula for the chemical composition of one particular protein. The large majority of abundantly expressed genes are associated with common functions, such as metabolism, and hence are expressed in all cells. However, there will be differences between the expression profiles of different cells, and even in a single cell, expression will vary with time, in a manner dictated by external and internal signals that reflect the state of the organism and the cell itself [7]. A natural basis for organizing gene expression data is to group together genes with similar patterns or expression. For any series of measurements, a number of sensible measures of similarity in the behavior of two genes can be used [8]. This information, then, can be used by the experts in biological sciences to gather further knowledge in the area. This situation makes cluster analysis the best candidate for extracting the information out of gene expressions.

Methodology

For testing purposes, I used three different clustering programs; one is a single threaded program and the other two are programs that use CUDA[9] and OpenCL[10] parallel programming APIs respectively. For the C program, I used Cluster 3.0 [11] software. The CUDA and OpenCL implementations are done by me.

The clustering algorithm used in this project is hierarchical clustering with Euclidean distance[12] as a distance metric and single linkage[13] as a linkage method.

The gene data is gathered from Gene Expression Omnibus Data Set Record 3345 [14]. Then the following data sets with given row count x column count are generated: 4096x16, 8192x16, 16384x16, 4096x32, 8192x32, 16384x32, 4096x64, 8192x64, 16384x64. Each of these sets is given as an input to the three programs.

Evaluation

Evaluation of the work is based on performance metrics used in evaluation of processing units (CPUs and GPUs). Please see Benchmarking Tools section of the wiki for more detailed info.

Results

Results showed us that the program written using CUDA API performed significantly better than OpenCL and Cluster 3.0. The speedup of CUDA compared to OpenCL was between 2 - 8 times, and compared to Cluster 3.0 was between 3 - 20 times. It can be argued that the performance difference between CUDA and OpenCL comes from the fact that OpenCL library is merely a wrapper around CUDA library.

References: