Course Outline


  • What is ROCm?
  • What is HIP?
  • ROCm vs CUDA vs OpenCL
  • Overview of ROCm and HIP features and architecture
  • ROCm for Windows vs ROCm for Linux


  • Installing ROCm on Windows
  • Verifying the installation and check the device compatibility
  • Updating or uninstall ROCm on Windows
  • Troubleshooting common installation issues

Getting Started

  • Creating a new ROCm project using Visual Studio Code on Windows
  • Exploring the project structure and files
  • Compiling and run the program
  • Displaying the output using printf and fprintf


  • Using ROCm API in the host program
  • Querying device information and capabilities
  • Allocating and deallocate device memory
  • Copying data between host and device
  • Launching kernels and synchronize threads
  • Handling errors and exceptions

HIP Language

  • Using HIP language in the device program
  • Writing kernels that execute on the GPU and manipulate data
  • Using data types, qualifiers, operators, and expressions
  • Using built-in functions, variables, and libraries

ROCm and HIP Memory Model

  • Using different memory spaces, such as global, shared, constant, and local
  • Using different memory objects, such as pointers, arrays, textures, and surfaces
  • Using different memory access modes, such as read-only, write-only, read-write, etc.
  • Using memory consistency model and synchronization mechanisms

ROCm and HIP Execution Model

  • Using different execution models, such as threads, blocks, and grids
  • Using thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
  • Using block functions, such as __syncthreads, __threadfence_block, etc.
  • Using grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.


  • Debugging ROCm and HIP programs on Windows
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
  • Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices


  • Optimizing ROCm and HIP programs on Windows
  • Using coalescing techniques to improve memory throughput
  • Using caching and prefetching techniques to reduce memory latency
  • Using shared memory and local memory techniques to optimize memory accesses and bandwidth
  • Using profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps


  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors
  • Familiarity with Windows operating system and PowerShell


  • Developers who wish to learn how to install and use ROCm on Windows to program AMD GPUs and exploit their parallelism
  • Developers who wish to write high-performance and scalable code that can run on different AMD devices
  • Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
 21 Hours

Number of participants

Price per participant

Testimonials (1)

Related Courses

Administration of CUDA

35 Hours

GPU Programming with CUDA and Python

14 Hours

AMD GPU Programming

28 Hours

NVIDIA GPU Programming

14 Hours

Introduction to GPU Programming

21 Hours

GPU Programming with CUDA

28 Hours

GPU Programming with OpenACC

28 Hours

GPU Programming with OpenCL

28 Hours

GPU Programming - OpenCL vs CUDA vs ROCm

28 Hours

NVIDIA GPU Programming - Extended

21 Hours

Hardware-Accelerated Video Analytics

14 Hours

Raster and Vector Graphics (Adobe Photoshop, CorelDraw)

28 Hours

Adobe LiveCycle Designer

14 Hours

Affinity Designer

14 Hours

Adobe Illustrator

14 Hours

Related Categories