Seminar Presentation -CUDA

8/2/2019 Seminar Presentation -CUDA

1/30


2/30


3/30

CUDA (an acronym forCompute

Unified Device Architecture) is a parallel

computing architecture developed by

NVIDIA. CUDA is the computing engine in

NVIDIA graphics processing unit (GPUs)

that is accessible to software developersthrough variants of industry standard

programming languages.


4/30

Chip in computer video cards, PlayStation 3, Xbox,

etc.

Two major vendors: NVIDIA and ATI (now AMD)


5/30

. GPU Computing with CUDA brings parallel

computing to the masses.

. Data-parallel supercomputers are everywhere!

. CUDA makes this power accessible


6/30

Applications:

High arithmetic intensity:

Dense linear algebra, PDEs, n-body, finite

difference, High bandwidth:

Sequencing (virus scanning, genomics), sorting,

database

Visual computing:

Graphics, image processing, tomography,

machine vision


7/30


8/30


9/30

` Compute Unified device architecture

For parallel computing Developed by NVIDIA Co-designed hardware & software for direct GPU computing

Hardware: fully general data-parallel arch

` General thread launch` Global load-store` Parallel data cache` Scalar architecture` Integers, bit operations` Double precision (shortly)


10/30

` Thread : The smallest unit executing aninstruction.

` Block : Contains several threads.

` Warp : A group of threads physically executed in

parallel (usually running the same application).` Grid : Contains several thread blocks.

` Kernel : An application or program, that runs onthe GPU.

` Device : The GPU.

` Host : The CPU.


11/30


12/30

> Expose as much parallelism as possible

> Optimize memory usage for maximum

bandwidth

> Maximize occupancy to hide latency

>

Optimize instruction usage formaximum throughput


13/30

` Each thread can:

Read/write per-block on-chip shared memory

Read per-grid cached constant memory

Read/write non-cached device memory: Per-grid global memory

Per-thread local memory


14/30

Basic Strategies

` Processing data is cheaper than moving it around> Especially forGPUs as they devote many more transistors

to ALUs than memory` And will be increasingly so> The less memory bound a kernel is, the better it will scale

with future GPUs` So you want to:>Maximize use of low-latency, high-bandwidth memory>Optimize memory access patterns to maximize bandwidth


15/30


16/30

1. Copy data from main mem to GPU mem

2. CPU instructs the process to GPU

3. GPU execute parallel in each core4.4. Copy the result from GPU mem to main

memory.


17/30

Provide ability to run code on GPU

Manage resources

Partition data to fit on cores

Schedule blocks to cores


18/30

` Programming interface of CUDA applications is based on the standardC language with extensions, which facilitates the learning curve ofCUDA

` CUDA provides access to 16 KB of memory (per multiprocessor)shared between threads, which can be used to setup cache with higherbandwidth than texture lookups

`

More efficient data transfers between system and video memory` No need in graphics APIs with their redundancy and overheads` Linear memory addressing, gather and scatter, writing to arbitrary

addresses` Hardware support for integer and bit operations

` Scattered reads code can read from arbitrary addresses in memory.` Shared memory CUDA exposes a fast shared memory region (16KB

in size) that can be shared amongst threads. This can be used as auser-managed cache, enabling higher bandwidth than is possible usingtexture lookups.

` Faster downloads and read backs to and from the GPU` Full support for integer and bitwise operations, including integer texture

lookups


19/30

` up to 512 CUDA cores and 3.0 billion transistors

` NVIDIA Parallel Data Cache technology

` NVIDIA Giga Thread engine

` ECC memory support` Native support for Visual Studio


20/30

` Accelerated rendering of 3D graphics

` Real Time Cloth Simulation OptiTex.com - Real Time ClothSimulation

` Distributed Calculations, such as predicting the native

conformation of proteins` Medical analysis simulations, for example virtual reality

based on CT and MRI scan images.

` Physical simulations, in particular in fluid dynamics.

` Environment statistics

` Accelerated encryption, decryption and compression` Accelerated inter conversion of video file formats

` Artificial intelligence


21/30

APPLICATIONS


22/30

Ultra Sound Scaning


23/30

` GPU Electromagnetic Field simulation

` Cell phone irradiation

` MRI Design / Modeling

` Printed Circuit Boards

` Radar Cross Section (Military)

` Seismic Migration

` 8X Faster than Quad Core alone


24/30


25/30


26/30


27/30

` No recursive functions

` Minimum unit block of 32 threads

` Closed CUDA architecture, it belongs to NVIDIA


28/30

CUDA is a powerful parallel programming model Heterogeneous - mixed serial-parallel programming

Scalable- hierarchical thread execution model

Accessible - minimal but expressive changes to C

Interoperable - simple graphics interop mechanismsCUDA is an attractive platform

Broad - OpenGL, DirectX, WinXP, Vista, Linux, MacOS

Widespread - over 85M CUDA GPUs, 60K CUDAdevelopers

CUDA provides tremendous scope for innovativegraphics research beyond programmable shading


29/30

` Official site

` Nvidia Parallel N sight

` Nvidia CUDA developer registration for professionaldevelopers and researchers

` Nvidia CUDA GPU Computing developer forums` Programming Massively Parallel Processors: A

Hands-on Approach

` CUDA Tutorials for high performance computing

` www.google.com` Intro to GPGPU computing featuring CUDA and

OpenCL examples


30/30

Seminar Presentation -CUDA

Documents

Transcript of Seminar Presentation -CUDA