Cjharris Gpu Computing Opencl

61
7/28/2019 Cjharris Gpu Computing Opencl http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 1/61  Getting Started with OpenCL GPU Computing iVEC Workshop 30th May - 1st June 2012

Transcript of Cjharris Gpu Computing Opencl

Page 1: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 1/61

 

Getting Started with OpenCL GPU Computing

iVEC Workshop

30th May - 1st June 2012

Page 2: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 2/61

 

Open Compute Language (OpenCL)

OpenCL is the first open, royalty-free standard for cross-platform, parallel programming of modern processors

found in personal computers, servers andhandheld/embedded devices.

Participating companies and institutions:

OpenCL is being created by the Khronos Group:

3DLABS, Activision Blizzard, AMD, Apple, ARM, Broadcom, Codeplay, Electronic Arts,Ericsson, Freescale, Fujitsu, GE, Graphic Remedy, HI, IBM, Intel, Imagination

Technologies, Los Alamos National Laboratory, Motorola, Movidius, Nokia, NVIDIA,Petapath, QNX, Qualcomm, RapidMind, Samsung, Seaweed, S3, ST Microelectronics,

Takumi, Texas Instruments, Toshiba and Vivante.

http://www.khronos.org/opencl/

Page 3: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 3/61

 

How is OpenCL different from CUDA?

core GPU computingon NVIDIA hardware

AMD implementation

on AMD CPU/GPUand Intel CPU

Intel implementationon Intel CPU

IBM implementationon Intel/AMD/NVIDIA/Power

Intel implementationon Intel MIPS

optimised librariesfor NVIDIA hardware

better marketing

slightly simpler API

more readily availabledocumentation

portable, but not necessarilyoptimised code

Page 4: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 4/61

 

OpenCL Platforms

Platform:

A host plus a collection of devices managed by the OpenCLframework that allow an application to share resources andexecute kernels on devices in the platform.

Host

Device Device Device

Platform

Page 5: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 5/61

 

OpenCL Platforms : clGetPlatformIDs

cl_int clGetPlatformIDs( cl_uint num_entries,cl_platform_id* platforms,cl_uint* num_platforms)

num_entries : capacity of platform IDs in memory pointed to by platforms platforms : pointer to memory to store returned platform IDsnum_platforms : returns the actual number of platforms IDs available

Either CL_SUCCESS or an error code.

The following routine is used to query the number of

OpenCL platforms, and their corresonding IDs:

Arguments

Returns

Page 6: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 6/61

 

OpenCL Platforms : clGetPlatformIDs

#include <stdio.h>#include <CL/cl.h>

int main(int argc, char** argv){

// determine number of platformscl_int clErr;

cl_uint num_platforms;clErr = clGetPlatformIDs(0,NULL,&num_platforms);checkErr(clErr,__FILE__,__LINE__);printf("OpenCL Platforms found: %i\n",num_platforms);if(num_platforms<1) { exit(0); }

// get platform IDscl_platform_id platforms[num_platforms];clErr = clGetPlatformIDs(num_platforms,platforms,NULL);checkErr(clErr,__FILE__,__LINE__);

return 0;}

Code Example:

Page 7: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 7/61

 

OpenCL Errors

void checkErr(cl_int clErr, char* filename, int line){

if (clErr!=CL_SUCCESS){

printf("OpenCL Error %i at line %i of%s\n",clErr,line,filename);exit(EXIT_FAILURE);

}}

You can find the error codes in cl.h :

/* Error Codes */#define CL_SUCCESS 0#define CL_DEVICE_NOT_FOUND -1#define CL_DEVICE_NOT_AVAILABLE -2#define CL_COMPILER_NOT_AVAILABLE -3#define CL_MEM_OBJECT_ALLOCATION_FAILURE -4#define CL_OUT_OF_RESOURCES -5#define CL_OUT_OF_HOST_MEMORY -6#define CL_PROFILING_INFO_NOT_AVAILABLE -7#define CL_MEM_COPY_OVERLAP -8#define CL_IMAGE_FORMAT_MISMATCH -9#define CL_IMAGE_FORMAT_NOT_SUPPORTED -10#define CL_BUILD_PROGRAM_FAILURE -11

#define CL_MAP_FAILURE -12

...

Page 8: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 8/61

 

Where is OpenCL on Fornax?

NVIDIA Implementation (for NVIDIA GPU)

AMD Implementation (for Intel CPU)

Intel Implementation (for Intel CPU)

module load cuda /opt/centos6.1-modules/cuda/4.1.28/cuda/include/CL/cl.h/opt/nodes.updates/login.cuda.lib/lib64/libOpenCL.so/opt/nodes.updates/login.cuda.lib/lib/libOpenCL.so

module load AMDAPP/opt/centos6.1-modules/AMDAPP/2.5/include/CL/cl.h/opt/centos6.1-modules/AMDAPP/2.5/lib/x86_64/libOpenCL.so/opt/centos6.1-modules/AMDAPP/2.5/lib/x86/libOpenCL.so

not installed - hasn't been requested

Page 9: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 9/61

 

Compiling OpenCL on Fornax (NVIDIA)

Load required modules if necessary:

module load gccmodule load cuda

Command line compile:

gcc platform_id.c -o platform_id -lOpenCL

Better to use a Makefile:

default:gcc platform_id.c -o platform_id -lOpenCL

And compile with make:

make

Page 10: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 10/61

 

Running OpenCL on Fornax (NVIDIA)

Change to scratch directory:

cd /scratch/ projectname/username/programpath

One node PBS script subPlatformID:

#!/bin/bash#PBS -W group_list= projectname #PBS -q workq#PBS -l walltime=00:10:00

#PBS -l select=1:ncpus=1:ngpus=1:mem=64gb#PBS -l place=excl

module load cuda

cd /scratch/ projectname/username/programpath/home/username/ programpath/platform_id

Submit with qsub:

qsub subPlatformID

Check queue, directory for output:

qstatlscat subPlatformID.oXXXX subPlatformID.eXXXX

Page 11: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 11/61

 

OpenCL Platforms : clGetPlatformInfo

cl_int clGetPlatformInfo( cl_platform_id platform,cl_platform_info  param_name,size_t  param_value_size,void* param_value,size_t param_value_size_ret )

 platform : the platform being queried param_name : CL_PLATFORM_PROFILE, CL_PLATFORM_VERSION param_value_size : size of memory pointed to by param_value param_value : pointer to memory to store return value param_value_size_ret : returns the size in bytes of data being queried 

Either CL_SUCCESS or an error code.

The following OpenCL routine is used to query platforms:

Arguments

Returns

Page 12: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 12/61

 

OpenCL Platforms : clGetPlatformInfo

...

// get platform infoint i;for (i=0; i<num_platforms; i++){

size_t size;

clErr = clGetPlatformInfo(platforms[i],CL_PLATFORM_VENDOR,0,NULL,&size);checkErr(clErr,__FILE__,__LINE__);char vendor[size];clErr = clGetPlatformInfo(platforms[i],CL_PLATFORM_VENDOR,size,vendor,NULL);checkErr(clErr,__FILE__,__LINE__);printf("Platform %i: %s\n",i,vendor);

}

...

Code Example:

Page 13: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 13/61

 

OpenCL Programming Task : Platform Query

Write a program that prints out:

- the number of OpenCL platforms- the names of the OpenCL platforms

You can find a template in:

/scratch/courses01/templates/opencl_platform.c

You may find the following function definitions useful:

cl_int clGetPlatformInfo( cl_platform_id platform,cl_platform_info  param_name,size_t  param_value_size,void* param_value,size_t param_value_size_ret )

cl_int clGetPlatformIDs( cl_uint num_entries,cl_platform_id* platforms,cl_uint* num_platforms)

 param_name : CL_PLATFORM_PROFILE, CL_PLATFORM_VERSION

Page 14: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 14/61

 

OpenCL Devices

Device:

An OpenCL device consists of a global memory and a numberof compute units, each in turn containing a number of

 processing elements and a local memory .

GlobalMemory

Compute UnitDevice

PE PE PE PE

PE PE PE PE

Compute Unit

PE PE PE PE

PE PE PE PE

Compute Unit

PE PE PE PE

PE PE PE PE

Page 15: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 15/61

 

OpenCL Devices : clGetDeviceIDs

cl_int clGetDeviceIDs( cl_platform_id platform,cl_device_type device_type,cl_uint num_entries,cl_device_id* devices,size_uint* num_devices)

 platform : platform ID of desired platformdevice_type : CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, etcnum_entries : size of pointer allocationdevices : pointer to return device IDsnum_devices : pointer to return number of devices 

Either CL_SUCCESS or an error code.

The following OpenCL routine is used obtain the number ofdevices, and their Ids, available in a platform:

Arguments

Returns

Page 16: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 16/61

 

OpenCL Devices : clGetDeviceIDs

...

// get number of devicescl_uint num_devices;clErr = clGetDeviceIDs(platform,CL_DEVICE_TYPE_GPU,0,NULL,&num_devices);checkErr(clErr,__FILE__,__LINE__);

printf("\nOpenCL GPU Devices found: %i\n",num_devices);

// get device IDscl_device_id devices[num_devices];clErr = clGetDeviceIDs(platform,CL_DEVICE_TYPE_GPU,num_devices,devices,NULL);checkErr(clErr,__FILE__,__LINE__);

...

Code Example:

O CL D i lG D i I f

Page 17: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 17/61

 

OpenCL Devices : clGetDeviceInfo

cl_int clGetDeviceInfo( cl_device_id device,cl_device_info  param_name,size_t  param_value_size,void* param_value,size_t param_value_size_ret )

device : the device to query  param_name : CL_DEVICE_NAME, and many more param_value_size : size of memory pointed to by param_value param_value : pointer to memory to store return value param_value_size_ret : returns the size in bytes of data being queried 

Either CL_SUCCESS or an error code.

The following OpenCL routine is used to query devices:

Arguments

Returns

O CL D i lG tD i I f

Page 18: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 18/61

 

OpenCL Devices : clGetDeviceInfo

CL_DEVICE_TYPECL_DEVICE_VENDOR_IDCL_DEVICE_MAX_COMPUTE_UNITSCL_DEVICE_MAX_WORK_ITEM_DIMENSIONSCL_DEVICE_MAX_WORK_ITEM_SIZESCL_DEVICE_MAX_WORK_GROUP_SIZECL_DEVICE_PREFERRED_VECTOR_WIDTH_CHARCL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORTCL_DEVICE_PREFERRED_VECTOR_WIDTH_INT

CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONGCL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLECL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOATCL_DEVICE_PREFERRED_VECTOR_WIDTH_HALFCL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLECL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF

...

There are a long list of device properties, they are listed inthe OpenCL specification document:

O CL D i lG tD i I f

Page 19: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 19/61

 

OpenCL Devices : clGetDeviceInfo

...

// get device infofor (i=0; i<num_devices; i++){

size_t size;clErr = clGetDeviceInfo(devices[i],CL_DEVICE_NAME,0,NULL,&size);

checkErr(clErr,__FILE__,__LINE__);char name[size];clErr = clGetDeviceInfo(devices[i],CL_DEVICE_NAME,size,name,NULL);checkErr(clErr,__FILE__,__LINE__);printf("\tDevice %i: %s\n",i,name);

}

...

Code Example:

OpenCL Programming Task Device Query

Page 20: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 20/61

 

OpenCL Programming Task : Device Query

Write a program that prints out:- the names of the devices in the platform

You can find a template in:

/scratch/courses01/templates/opencl_device.c

You may find the following function definitions useful:

cl_int clGetDeviceIDs( cl_platform_id platform,cl_device_type device_type,cl_uint num_entries,cl_device_id* devices,size_uint* num_devices)

device_type : CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, etccl_int clGetDeviceInfo( cl_device_id device,

cl_device_info  param_name,size_t  param_value_size,void* param_value,size_t param_value_size_ret )

 param_name : CL_DEVICE_NAME, etc

OpenCL Context

Page 21: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 21/61

 

OpenCL Context

Context:

An OpenCL context are a collection of OpenCL concepts thatare associated with a group of devices, including Command Queues, Device Buffers, Programs and Kernels.

Context

Device

Device

DeviceBuffer

DeviceBuffer

DeviceBuffer

Program

Kernel

Kernel

CommandQueue

OpenCL Context : clCreateContext

Page 22: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 22/61

 

OpenCL Context : clCreateContext

cl_context clCreateContext ( cl_context_properties *properties,cl_uint num_devices,cl_device_id *devices,void (CL_CALLBACK *pfn_notify)

(const char *errinfo,const void *private_info, size_t cb,void *user_data),

void *user_data,cl_int *errcode_ret)

The following OpenCL routine is used to create contexts:

Arguments

Returns

 properties : the desired properties of the context (more next slide)num_devices : the number of devices in the context

devices : pointer to a list of IDs of the desired devices pfn_notify : pointer to callback functionuser_data : pointer to user defined data to be returned by the callbackerrcode_ret : pointer to value to return error code.

The requested OpenCL context, assuming no errors were returned.

OpenCL Context Properties

Page 23: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 23/61

 

OpenCL Context Properties

// define desired context properties listcl_context_properties properties[] = {CL_CONTEXT_PLATFORM,

(cl_context_properties) platform,

0};

// create contextcl_context context = clCreateContext(properties,1,&device,NULL,NULL,&clErr);checkErr(clErr,__FILE__,__LINE__);

...

The cl_context_properties type is a zero terminated list of

Context properties and their desired values.

As a minimum, the corresponding platform should beprovided:

OpenCL Context : clReleaseContext

Page 24: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 24/61

 

OpenCL Context : clReleaseContext

When a context is no longer required, it should be released:

Arguments

Returns

context : the context to release

cl_int clReleaseContext (cl_context context)

Either CL_SUCCESS or an error code

Example Code:

...

// release contextclErr = clReleaseContext(context);checkErr(clErr,__FILE__,__LINE__);

...

OpenCL Programming Task : Context

Page 25: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 25/61

 

OpenCL Programming Task : Context

Write a program that:- creates an OpenCL context

You can find a template in:

/scratch/courses01/templates/opencl_context.c

You may find the following function definitions useful:

cl_context clCreateContext ( cl_context_properties *properties,cl_uint num_devices,cl_device_id *devices,void (CL_CALLBACK *pfn_notify)

(const char *errinfo,const void *private_info, size_t cb,void *user_data),void *user_data,cl_int *errcode_ret)

cl_context_properties properties[] = {CL_CONTEXT_PLATFORM,(cl_context_properties) platform,0};

OpenCL Command Queue

Page 26: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 26/61

 

OpenCL Command Queue

Command Queue:

An OpenCL command queues provide a mechanism to queue

commands that operate on the various objects of a context.

The command queue can either act as a simple First In First Out(FIFO) queue, or use events to create command dependencies.

Context

Device

Device

DeviceBuffer

DeviceBuffer

DeviceBuffer

Program

Kernel

Kernel

CommandQueue

OpenCL Command Queue : clCreateCommandQueue

Page 27: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 27/61

 

OpenCL Command Queue : clCreateCommandQueue

cl_command_queue clCreateCommandQueue (cl_context context,cl_device_id device,cl_command_queue_properties properties,cl_int* errcode_ret)

The following OpenCL routine is used to create queues:

Arguments

Returns

context : the context for the command queuedevice : the device that is the target of the commands

 properties : CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, etcerrcode_ret : pointer to value to return an error code

The requested OpenCL command queue

OpenCL Command Queue : clReleaseCommandQueue

Page 28: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 28/61

 

OpenCL Command Queue : clReleaseCommandQueue

  cl_int clReleaseCommandQueue (cl_command_queue command_queue)

The following OpenCL routine is used to release queues:

Arguments

Returns

command_queue : the queue to release

Either CL_SUCCESS or an error code

OpenCL Command Queue

Page 29: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 29/61

 

OpenCL Command Queue

...

// create command queuecl_command_queue queue = clCreateCommandQueue(context,device,0,&clErr);checkErr(clErr,__FILE__,__LINE__);

...

// release command queueclErr = clReleaseCommandQueue(queue);checkErr(clErr,__FILE__,__LINE__);

...

Code Example:

OpenCL Programming Task : Command Queue

Page 30: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 30/61

 

p g g Q

Write a program that:

- creates and releases a command queue

You can find a template in:

/scratch/courses01/templates/opencl_queue.c

You may find the following function definitions useful:cl_command_queue clCreateCommandQueue (cl_context context,

cl_device_id device,cl_command_queue_properties properties,cl_int* errcode_ret)

  cl_int clReleaseCommandQueue (cl_command_queue command_queue)

OpenCL Buffers

Page 31: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 31/61

 

p

Buffer:

An OpenCL buffer is a memory object that resides in device globalmemory. There are also many other types of memory objects, thatsupport various data structures.

Buffers are attached to contexts and are associated with devices.

GlobalMemory

Compute UnitDevice

PE PE PE PE

PE PE PE PE

Compute Unit

PE PE PE PE

PE PE PE PE

Compute Unit

PE PE PE PE

PE PE PE PE

DeviceBuffer

DeviceBuffer

OpenCL Buffers : clCreateBuffer

Page 32: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 32/61

 

p

cl_mem clCreateBuffer ( cl_context context,cl_mem_flags flags,size_t size,void *host_ptr,cl_int *errcode_ret)

The following OpenCL routine is used to create buffers:

Arguments

Returns

context : the context for the buffercl_mem_flags : CL_MEM_READ_WRITE, CL_MEM_READ_ONLY, etcsize : size of the buffer in byteshost_pointer : pointer to host memory to populate buffer (optional)

errcode_ret : pointer to value to return an error code

The requested OpenCL buffer, as a cl_mem object.

OpenCL Buffers : clReleaseMemObject

Page 33: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 33/61

 

p j

cl_int clReleaseMemObject (cl_mem memobject)

The following OpenCL routine is used to release buffers:

Arguments

Returns

memobject : the memobject to release

Either CL_SUCCESS or an error code

OpenCL Buffers : clWriteBuffer

Page 34: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 34/61

 

cl_int clEnqueueWriteBuffer ( cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_write,size_t offset,size_t size,const void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

The following OpenCL routine is used to write data fromhost memory into a buffer:

Arguments

Returns

command_queue : the queue to enqueue the write tobuffer : the buffer to write toblocking_write : whether this function blocks until the transfer is completeoffset : how far into the buffer to begin writing

size : the size of the transfer in bytes ptr : the location in host memory of the datanum_events_in_wait_list : number of events the write is dependent onevent_wait_list : list of events the write is dependent onevent : returns an event corresponding to this write

Either CL_SUCCESS or an error code

OpenCL Buffers : clReadBuffer

Page 35: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 35/61

 

cl_int clEnqueueReadBuffer ( cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_read,size_t offset,size_t size,const void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

The following OpenCL routine is used to read data fromhost memory into a buffer:

Arguments

Returns

command_queue : the queue to enqueue the read tobuffer : the buffer to read fromblocking_read : whether this function blocks until the transfer is complete (CL_TRUE/FALSE)offset : how far into the buffer to begin reading

size : the size of the transfer in bytes ptr : the location in host memory to put the datanum_events_in_wait_list : number of events the read is dependent onevent_wait_list : list of events the read is dependent onevent : returns an event corresponding to this read

Either CL_SUCCESS or an error code

OpenCL Buffers

Page 36: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 36/61

 

...

// create device buffercl_mem device_values = clCreateBuffer(context,CL_MEM_READ_WRITE,bsize,

NULL,&clErr);checkErr(clErr,__FILE__,__LINE__);

// write image to device bufferclErr = clEnqueueWriteBuffer(queue,device_values,CL_TRUE,0,bsize,(void*)host_values,0,NULL,NULL);

checkErr(clErr,__FILE__,__LINE__);

// read image from device bufferclErr = clEnqueueReadBuffer(queue,device_values,CL_TRUE,0,bsize,

(void*)host_values,0,NULL,NULL);checkErr(clErr,__FILE__,__LINE__);

// release device bufferclErr = clReleaseMemObject(device_values);checkErr(clErr,__FILE__,__LINE__);

...

Code Example:

OpenCL Programming Task : Buffers

Page 37: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 37/61

 

Write a program that:- creates a two arrays on the host, and populates one

- writes the populated array to a device buffer- reads the device buffer to the other array on the host

/scratch/courses01/templates/opencl_buffers.c

You may find the following function definitions useful:

cl_int clEnqueueWriteBuffer ( cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_write,size_t offset,size_t size,const void *ptr,cl_uint num_events_in_wait_list,

const cl_event *event_wait_list,cl_event *event)

cl_mem clCreateBuffer ( cl_context context,cl_mem_flags flags,size_t size,void *host_ptr,cl_int *errcode_ret)

You can find a template in:

OpenCL Programs

Page 38: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 38/61

 

Program:

An OpenCL program is a set of kernel sources written as functions

defined with the __kernel qualifier, and binaries compiled for specificdevice architectures.

Context

Device

Device

DeviceBuffer

DeviceBuffer

DeviceBuffer

Program

Kernel

Kernel

CommandQueue

OpenCL Programs : clCreateProgramWithSource

Page 39: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 39/61

 

cl_program clCreateProgramWithSource ( cl_context context,cl_uint count,const char** strings,const size_t* lengths,cl_int* errcode_ret)

The following OpenCL routine is used to create programs:

Arguments

Returns

context : the context for the programcount : the number of strings containing the sourcestrings : pointer to array of pointers to the stringslengths : pointer to array of the string lengths (NULL if \0 terminated)

errcode_ret : pointer to value to return an error code

The requested OpenCL program

OpenCL Programs : clCreateProgramWithSource

Page 40: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 40/61

 

Options for program kernel source code:

1) Include the kernels as strings in the host source file- have to code within quotes- need to recompile to change kernel source- guaranteed to have kernel source

2) Read in the kernels from files at runtime- can code normally- can change kernels without recompiling- need to ensure path to files is correct

OpenCL Programs : clCreateProgramWithSource

Page 41: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 41/61

 

Kernel Source String Example:

const char* source ="__kernel void zeroValues(__global int* values, int imax)\n""{\n"" // thread index and total\n"" int idx = get_global_id(0);\n"

" int idtotal = get_global_size(0);\n""\n"" // zero values\n"" int i;\n"" for(i=idx;i<imax;i+=idtotal)\n"" {\n"" values[i] = 0;\n"" }\n""}\n\0";

OpenCL Programs : clCreateProgramWithSource

Page 42: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 42/61

 

Kernel Source File Example:

 __kernel void zeroValues(__global int* values, int imax){

// thread index and totalint idx = get_global_id(0);

int idtotal = get_global_size(0);

// zero valuesint i;for(i=idx;i<imax;i+=idtotal){

values[i] = 0;}

}

OpenCL Programs : clBuildProgram

Page 43: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 43/61

 

cl_int clBuildProgram (cl_program program,cl_uint num_devices,const cl_device_id* device_list,const char* options,

void (CL_CALLBACK *pfn_notify) (cl_program program,void *user_data),void* user_data)

Use clBuildProgram to compile and link the kernel source:

Arguments

Returns

 program : the program to buildnum_devices : the number of devices target

device_list : list of devices to targetoptions : compiler flags pfn_notify : pointer to callback when done (blocking call if NULL)user_data : data to provide in callback

CL_SUCCESS or an error code

OpenCL Programs : clGetProgramBuildInfo

Page 44: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 44/61

 

cl_int clGetProgramBuildInfo (cl_program program,cl_device_id device,cl_program_build_info param_name,size_t param_value_size,void* param_value,

size_t* param_value_size_ret)

Use clGetProgramBuildInfo to get the compiler log:

Arguments

Returns

 program : the program that was builtdevice : the device that the kernels were compiled for

 param_name : CL_PROGRAM_BUILD_LOG, etc

 param_value_size : size of memory pointed to by param_value param_value : pointer to memory to store return value param_value_size_ret : returns the size in bytes of data being queried

CL_SUCCESS or an error code

OpenCL Programs : clCreateKernels

Page 45: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 45/61

 

cl_kernel clCreateKernel ( cl_program program,const char* kernel_name,cl_int* errcode_ret)

Use clCreateKernels to define the kernel entry points:

Arguments

Returns

 program : the program that was builtkernel_name : the name of the kernel functionerrcode_ret : pointer to value to return an error code

The OpenCL kernel corresponding to the kernel name

OpenCL Programs : clReleaseProgram, clReleaseKernel

Page 46: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 46/61

 

cl_int clReleaseKernel (cl_kernel kernel)

Use clReleaseKernel to release the kernel:

Arguments

Returns

kernel : the kernel to release program : the program to release

Either CL_SUCCESS or an error code

cl_int clReleaseProgram (cl_program program)

Use clReleaseProgram to release the program:

OpenCL Programs and Kernels

Page 47: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 47/61

 

// create program from sourcecl_program program = clCreateProgramWithSource(context,1,&source,NULL,&clErr);checkErr(clErr,__FILE__,__LINE__);

// compile programclErr = clBuildProgram(program, 1, &device,"",NULL,NULL);checkErr(clErr,__FILE__,__LINE__);

// print build log

clErr = clGetProgramBuildInfo(program,device,CL_PROGRAM_BUILD_LOG,0,NULL,&size);checkErr(clErr,__FILE__,__LINE__);char build_log[size];clErr = clGetProgramBuildInfo(program,device,CL_PROGRAM_BUILD_LOG,size,build_log,NULL);checkErr(clErr,__FILE__,__LINE__);printf("\nBuild Log:\n\n%s\n\n",build_log);

// create kernelcl_kernel kernel = clCreateKernel(program,"invertValues",&clErr);

checkErr(clErr,__FILE__,__LINE__);

// release kernelclErr = clReleaseKernel(kernel);checkErr(clErr,__FILE__,__LINE__);

// release programclErr = clReleaseProgram(program);checkErr(clErr,__FILE__,__LINE__);

Code Example:

OpenCL Programming Task : Programs and Kernels

Page 48: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 48/61

 

Write and build a kernel that would:- invert and array of integers valued 0-255

/scratch/courses01/templates/opencl_program.c

You may find the following function definitions useful:

You can find a template in:

cl_program clCreateProgramWithSource ( cl_context context,cl_uint count,

const char** strings,const size_t* lengths,cl_int* errcode_ret)

  cl_kernel clCreateKernel ( cl_program program,const char* kernel_name,

cl_int* errcode_ret)

  cl_int clBuildProgram (cl_program program,cl_uint num_devices,

const cl_device_id* device_list,const char* options,void (CL_CALLBACK *pfn_notify) (cl_program program,void *user_data),void* user_data)

OpenCL Kernel Execution

Page 49: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 49/61

 

To execute the kernel on the device, we must

1) Set the Kernel Arguments

2) Determine the Thread Topology (NDRange)

3) Enqueue the Kernel Execution

Context

Device

Device

DeviceBuffer

DeviceBuffer

DeviceBuffer

Program

Kernel

Kernel

CommandQueue

OpenCL Kernels : clSetKernelArg

Page 50: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 50/61

 

cl_int clSetKernelArg ( cl_kernel kernel,cl_uint arg_index,size_t arg_size,const void* arg_value)

Use clSetKernelArg to specify the kernel arguments:

Arguments

Returns

kernel : the kernel the argument belongs toarg_index : the index of the argumentarg_size : the size of the argumentarg_value : a pointer to the value of the argument

CL_SUCCESS or an error code

OpenCL Setting Kernel Arguments

Page 51: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 51/61

 

...

int imax = 1024;

...

// create device buffer

cl_mem device_values = clCreateBuffer ...checkErr(clErr,__FILE__,__LINE__);

...

// set kernel argumentsclErr = clSetKernelArg(kernel,0,sizeof(cl_mem),&device_values);

checkErr(clErr,__FILE__,__LINE__);clErr = clSetKernelArg(kernel,1,sizeof(int),&imax);checkErr(clErr,__FILE__,__LINE__);

...

Code Example:

OpenCL Thread Topology

Page 52: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 52/61

 

OpenCL uses a scalable programming model that uses a NDRange of multiple worgroups that contain the

workitems that will execute on the device.

NDRange

Workgroup 0

W0 W1 W2 W3

Workgroup 1 Workgroup 2

Each workitem has access to functions that return thedimensions of the NDRange and Workgroupd, as well asits index within them.

uint get_work_dim()size_t get_global_size(uint d)size_t get_global_id(uint d)size_t get_local_size(uint d)size_t get_local_id(uint d)

W0 W1 W2 W3 W0 W1 W2 W3

OpenCL Thread Topology

Page 53: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 53/61

 

The NDRange is divided into workgroups so that they canbe dynamically allocated to the compute units.

Multithreaded OpenCL Program

WG 0 WG 1 WG 2 WG 3 WG 4 WG 5 WG 6 WG 7

2x compute unit device

CU 0 CU 1

WG 0 WG 1

WG 2 WG 3

WG 4 WG 5

WG 6 WG 7

4x compute unit device

CU 0 CU 1

WG 0 WG 1 WG 2 WG 3

WG 4 WG 5 WG 6 WG 7

CU 0 CU 1

OpenCL Thread Topology Implications

Page 54: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 54/61

 

The workgroup size must consider themultiprocessor architecture, with some

consideration for future changes.

CU 0

WG 0

WG 1

Just consider a few workgroupsrunning on a single compute unit.

What does the workgroup sizeeffect?

CUDA Thread Topology Implications

Page 55: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 55/61

 

The major consideration in choosing theNDRange size is the number of compute units,

 with some consideration for future changes.

Just consider all workgroupsrunning on all the multiprocessor.

What does the number of workgroups in the NDRange effect?

4x compute unit device

CU 0 CU 1

WG 0 WG 1 WG 2 WG 3

WG 4 WG 5 WG 6 WG 7

CU 0 CU 1

OpenCL Kernels : clEnqueueNDRangeKernel

l l

Page 56: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 56/61

 

cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue,

cl_kernel kernel,cl_uint work_dim,const size_t * global_work_offset,const size_t * global_work_size,const size_t * local_work_size,cl_uint num_events_in_wait_list,const cl_event * event_wait_list,

cl_event * event)

Use clEnqueueNDRangeKernel to queue the kernel:

Arguments

Returns

command_queue : the queue to submit the kernel tokernel : the kernel to submitwork_dim : the dimensions of the thread topologyglobal_work_offset : a pointer to an array of offsets to the global indicesglobal_work_size : a pointer to an array of sizes of the global NDRange

local_work_size : a pointer to an array of sizes of the local workgroupnum_events_in_wait_list : number of events the kernel exectution is dependent onevent_wait_list : list of events the kernel execution is dependent onevent : returns an event corresponding to this kernel execution

CL_SUCCESS or an error code

OpenCL Kernel Execution

Page 57: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 57/61

 

...

// enqueue kernelcl_uint dim = 1;size_t offset = 0;size_t local_size = 128;size_t global_size = 4*14*local_size;clErr = clEnqueueNDRangeKernel(queue,kernel,dim,&offset,&global_size,&local_size,

0,NULL,NULL);

checkErr(clErr,__FILE__,__LINE__);

...

Code Example:

OpenCL Programming Task : Invert Kernel

W it th t

Page 58: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 58/61

 

Write a program that:- generates an array of at least a thousand values,

between 0 and 255

- print the first few values of the array- inverts the array on the GPU (subtract values from 256)- and prints the first few new values of the array

You can find template files at:

/scratch/courses01/templates/opencl_inverse.cYou may find the following definitions useful: 

cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue,cl_kernel kernel,

cl_uint work_dim,const size_t * global_work_offset,const size_t * global_work_size,const size_t * local_work_size,cl_uint num_events_in_wait_list,const cl_event * event_wait_list,cl_event * event)

OpenCL Programming Task : Sum Kernel

Page 59: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 59/61

 

Write a program that:- generates an array of at least a million values,

between 0 and 255- sums the array using a loop on the CPU- sums the array using the GPU- prints the two results

 

Copy your invert code as a starting point.

Hints:- each workitem can add some numbers together- you can synchronize workitems by stopping the kernel

- you may need more than one device buffer allocation- if your array is large enough, you may need to consider

numerical precision. 

Further OpenCL Concepts

Page 60: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 60/61

 

C-extensions in the kernel language for vectorsLocal memory for workitem communication within workgroups

Workgroup and device-level synchronisationCoalescing global memory accessBranching issuesMemory stalls and arithmetic intensityOverlaping kernels with host-device transfersPinned memory host-device transfers

Managing compute locality in algorithmsGraphical data-types and hardware accelerationGraphics API interoperabilityMode switchingUsing OpenCL events

Page 61: Cjharris Gpu Computing Opencl

7/28/2019 Cjharris Gpu Computing Opencl

http://slidepdf.com/reader/full/cjharris-gpu-computing-opencl 61/61