Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series)

Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series)

Language: English

Pages: 280

ISBN: 0123814723

Format: PDF / Kindle (mobi) / ePub

Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs.
Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA.
The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who need information about computational thinking and parallel programming.

  • Teaches computational thinking and problem-solving techniques that facilitate high-performance parallel computing.
  • Utilizes CUDA (Compute Unified Device Architecture), NVIDIA's software development tool created specifically for massively parallel environments.
  • Shows you how to achieve both high-performance and high-reliability using the CUDA programming model as well as OpenCL.

Understanding and Applying Machine Vision (2nd Edition) (Manufacturing Engineering and Materials Processing)

Big Data: Principles and best practices of scalable realtime data systems

Engineering Long-Lasting Software: An Agile Approach Using SaaS and Cloud Computing (Beta Edition)

Haptics: Generating and Perceiving Tangible Sensations: International Conference, EuroHaptics 2010, Amsterdam, July 2010, Proceedings Part 1

GPU Pro 7: Advanced Rendering Techniques













function invoked has a single parameter that is initialized to the location of a thread within the compute domain. This is again represented by a class template, index, which represents a short vector of integer values. The rank of an index is the length of this vector and is the same as the rank of the extent. The index parameter conveys the same information as the explicitly computed value i in the CUDA code (see Figure 18.1, line 3). These index values can be used to select elements in an

3.4, passing an array name h_A as the first argument to function call to vecAdd makes the function’s first parameter A point to the 0th element of h_A. We say that h_A is passed by reference to vecAdd. As a result, A[i] in the function body can be used to access h_A[i]. See Patt& Patel [Patt] for an easy-to-follow explanation of the detailed usage of pointers in C. * * * The vecAdd() function in Figure 3.4 uses a for loop to iterate through the vector elements. In the ith iteration,

all threads in a warp following the single instruction, multiple data (SIMD) model. That is, at any instant in time, one instruction is fetched and executed for all threads in the warp. This is illustrated in Figure 4.14 with a single instruction fetch/dispatch shared among execution units in the SM. Note that these threads will apply the same instruction to different portions of the data. As a result, all threads in a warp will always have the same execution timing. Figure 4.14 also shows a

reward or a lecture? Why? References 1. CUDA Occupancy Calculator. 2. CUDA C. Best Practices Guide. 2012;v. 4.2. 3. Ryoo, S., Rodrigues, C., Stone, S., Baghsorkhi, S., Ueng, S., Stratton, J., & Hwu, W. Program optimization space pruning for a multithreaded GPU, Proceedings of the 6th ACM/IEEE International Symposium on Code Generation and Optimization, April 6–9, 2008. 4. Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., & Hwu, W. W. Optimization principles and

values within each group. Having the numbers sorted in ascending order allows a sequential addition to get higher accuracy. This is a reason why sorting is frequently used in massively parallel numerical algorithms. Interested readers should study more advanced techniques such as compensated summation algorithm, also known as Kahan’s summation algorithm, for getting an even more robust approach to accurate summation of floating-point values [Kahan1965]. 7.6 Numerical Stability While the

Download sample