Seminar Presentation -CUDA

download Seminar Presentation -CUDA

of 30

Transcript of Seminar Presentation -CUDA

  • 8/2/2019 Seminar Presentation -CUDA

    1/30

  • 8/2/2019 Seminar Presentation -CUDA

    2/30

  • 8/2/2019 Seminar Presentation -CUDA

    3/30

    CUDA (an acronym forCompute

    Unified Device Architecture) is a parallel

    computing architecture developed by

    NVIDIA. CUDA is the computing engine in

    NVIDIA graphics processing unit (GPUs)

    that is accessible to software developersthrough variants of industry standard

    programming languages.

  • 8/2/2019 Seminar Presentation -CUDA

    4/30

    Chip in computer video cards, PlayStation 3, Xbox,

    etc.

    Two major vendors: NVIDIA and ATI (now AMD)

  • 8/2/2019 Seminar Presentation -CUDA

    5/30

    . GPU Computing with CUDA brings parallel

    computing to the masses.

    . Data-parallel supercomputers are everywhere!

    . CUDA makes this power accessible

  • 8/2/2019 Seminar Presentation -CUDA

    6/30

    Applications:

    High arithmetic intensity:

    Dense linear algebra, PDEs, n-body, finite

    difference, High bandwidth:

    Sequencing (virus scanning, genomics), sorting,

    database

    Visual computing:

    Graphics, image processing, tomography,

    machine vision

  • 8/2/2019 Seminar Presentation -CUDA

    7/30

  • 8/2/2019 Seminar Presentation -CUDA

    8/30

  • 8/2/2019 Seminar Presentation -CUDA

    9/30

    ` Compute Unified device architecture

    For parallel computing Developed by NVIDIA Co-designed hardware & software for direct GPU computing

    Hardware: fully general data-parallel arch

    ` General thread launch` Global load-store` Parallel data cache` Scalar architecture` Integers, bit operations` Double precision (shortly)

  • 8/2/2019 Seminar Presentation -CUDA

    10/30

    ` Thread : The smallest unit executing aninstruction.

    ` Block : Contains several threads.

    ` Warp : A group of threads physically executed in

    parallel (usually running the same application).` Grid : Contains several thread blocks.

    ` Kernel : An application or program, that runs onthe GPU.

    ` Device : The GPU.

    ` Host : The CPU.

  • 8/2/2019 Seminar Presentation -CUDA

    11/30

  • 8/2/2019 Seminar Presentation -CUDA

    12/30

    > Expose as much parallelism as possible

    > Optimize memory usage for maximum

    bandwidth

    > Maximize occupancy to hide latency

    >

    Optimize instruction usage formaximum throughput

  • 8/2/2019 Seminar Presentation -CUDA

    13/30

    ` Each thread can:

    Read/write per-block on-chip shared memory

    Read per-grid cached constant memory

    Read/write non-cached device memory: Per-grid global memory

    Per-thread local memory

  • 8/2/2019 Seminar Presentation -CUDA

    14/30

    Basic Strategies

    ` Processing data is cheaper than moving it around> Especially forGPUs as they devote many more transistors

    to ALUs than memory` And will be increasingly so> The less memory bound a kernel is, the better it will scale

    with future GPUs` So you want to:>Maximize use of low-latency, high-bandwidth memory>Optimize memory access patterns to maximize bandwidth

  • 8/2/2019 Seminar Presentation -CUDA

    15/30

  • 8/2/2019 Seminar Presentation -CUDA

    16/30

    1. Copy data from main mem to GPU mem

    2. CPU instructs the process to GPU

    3. GPU execute parallel in each core4.4. Copy the result from GPU mem to main

    memory.

  • 8/2/2019 Seminar Presentation -CUDA

    17/30

    Provide ability to run code on GPU

    Manage resources

    Partition data to fit on cores

    Schedule blocks to cores

  • 8/2/2019 Seminar Presentation -CUDA

    18/30

    ` Programming interface of CUDA applications is based on the standardC language with extensions, which facilitates the learning curve ofCUDA

    ` CUDA provides access to 16 KB of memory (per multiprocessor)shared between threads, which can be used to setup cache with higherbandwidth than texture lookups

    `

    More efficient data transfers between system and video memory` No need in graphics APIs with their redundancy and overheads` Linear memory addressing, gather and scatter, writing to arbitrary

    addresses` Hardware support for integer and bit operations

    ` Scattered reads code can read from arbitrary addresses in memory.` Shared memory CUDA exposes a fast shared memory region (16KB

    in size) that can be shared amongst threads. This can be used as auser-managed cache, enabling higher bandwidth than is possible usingtexture lookups.

    ` Faster downloads and read backs to and from the GPU` Full support for integer and bitwise operations, including integer texture

    lookups

  • 8/2/2019 Seminar Presentation -CUDA

    19/30

    ` up to 512 CUDA cores and 3.0 billion transistors

    ` NVIDIA Parallel Data Cache technology

    ` NVIDIA Giga Thread engine

    ` ECC memory support` Native support for Visual Studio

  • 8/2/2019 Seminar Presentation -CUDA

    20/30

    ` Accelerated rendering of 3D graphics

    ` Real Time Cloth Simulation OptiTex.com - Real Time ClothSimulation

    ` Distributed Calculations, such as predicting the native

    conformation of proteins` Medical analysis simulations, for example virtual reality

    based on CT and MRI scan images.

    ` Physical simulations, in particular in fluid dynamics.

    ` Environment statistics

    ` Accelerated encryption, decryption and compression` Accelerated inter conversion of video file formats

    ` Artificial intelligence

  • 8/2/2019 Seminar Presentation -CUDA

    21/30

    APPLICATIONS

  • 8/2/2019 Seminar Presentation -CUDA

    22/30

    Ultra Sound Scaning

  • 8/2/2019 Seminar Presentation -CUDA

    23/30

    ` GPU Electromagnetic Field simulation

    ` Cell phone irradiation

    ` MRI Design / Modeling

    ` Printed Circuit Boards

    ` Radar Cross Section (Military)

    ` Seismic Migration

    ` 8X Faster than Quad Core alone

  • 8/2/2019 Seminar Presentation -CUDA

    24/30

  • 8/2/2019 Seminar Presentation -CUDA

    25/30

  • 8/2/2019 Seminar Presentation -CUDA

    26/30

  • 8/2/2019 Seminar Presentation -CUDA

    27/30

    ` No recursive functions

    ` Minimum unit block of 32 threads

    ` Closed CUDA architecture, it belongs to NVIDIA

  • 8/2/2019 Seminar Presentation -CUDA

    28/30

    CUDA is a powerful parallel programming model Heterogeneous - mixed serial-parallel programming

    Scalable- hierarchical thread execution model

    Accessible - minimal but expressive changes to C

    Interoperable - simple graphics interop mechanismsCUDA is an attractive platform

    Broad - OpenGL, DirectX, WinXP, Vista, Linux, MacOS

    Widespread - over 85M CUDA GPUs, 60K CUDAdevelopers

    CUDA provides tremendous scope for innovativegraphics research beyond programmable shading

  • 8/2/2019 Seminar Presentation -CUDA

    29/30

    ` Official site

    ` Nvidia Parallel N sight

    ` Nvidia CUDA developer registration for professionaldevelopers and researchers

    ` Nvidia CUDA GPU Computing developer forums` Programming Massively Parallel Processors: A

    Hands-on Approach

    ` CUDA Tutorials for high performance computing

    ` www.google.com` Intro to GPGPU computing featuring CUDA and

    OpenCL examples

  • 8/2/2019 Seminar Presentation -CUDA

    30/30