Learn GPU Programming in Your Browser

Blog
Author

Sarah Pan, Austin Huang

Published

September 13, 2024

TL;DR

WebGPU has arrived, opening a direct pipeline from the web browser to your local GPU. We’ve built WebGPU Puzzles to help you try it and explore the possibilities. It’s a simple, interactive way to learn GPU programming using nothing but your browser:

gpupuzzles.answer.ai

We challenge you to do the puzzles, then share your ideas about the possibilities with us!

Introducing WebGPU Puzzles

WebGPU Puzzles is a web incarnation of Sasha Rush’s GPU Puzzles - a series of small, fun, self-contained coding challenges for learning GPU programming. The original GPU Puzzles was written for Numba/CUDA to be run on a remote server with a dedicated GPU device.

Using WebGPU, gpupuzzles.answer.ai lets you write code directly in the browser and have the app execute and check results automatically. GPU computation runs entirely locally in your browser1, whether you have a humble integrated GPU that comes with your laptop, or a high-end device GPU installed on an expensive workstation.

About WebGPU

WebGPU is a low-level, high-performance API for web browsers that works with most modern GPUs (and can be repurposed for native applications outside the browser too). By connecting the browser to local GPU compute, WebGPU effectively turns the web into one massive distributed GPU cluster with every connected personal device as a compute node.

This means web applications now have the ability to frictionlessly bring GPU compute online by simply serving a web page. If we think of neural networks as a new building block for running learned computation using GPUs, WebGPU provides a complementary building block allowing frictionless contribution and coordination of local GPU compute over the web.

WebGPU is still new. It was rolled out to Chrome in fall 2023 for macOS and Windows, while other browsers (Firefox, Safari) and platforms (Linux) are in the process of enabling their implementations.

Why We Built WebGPU Puzzles

Why did we build this app?

Our first goal is to help beginners write GPU code without worrying about low-level, vendor-specific tech stacks. As generative AI is becoming a core building block of computation, the ability to write and reason about code running on GPUs is increasingly important. We want to make GPU programming as simple and accessible as regular programming. In fact, accessibility was one of the reasons we chose to make a web app, allowing those with little prior GPU programming experience (including one of the contributors to this project!) to learn the ropes.

Second, for those of you who already know GPU programming through CUDA, we also wanted to show how the techniques you are already familiar with can be operationalized in web applications through WebGPU.

Finally, we wanted to demonstrate new possibilities now that the web browser can directly access local GPU compute. You can think of this WebGPU Puzzles app as an initial experiment in this new category of web-based GPU compute applications.

WebGPU Programming Basics

Although GPU programming is a deep and extensive topic, here we’ll provide a brief introduction to help newcomers understand the basic concepts and get started with WebGPU Puzzles. If you are already familiar with CUDA, you can probably skim this section.

At its core, dispatching a GPU computation differs from everyday CPU code because the GPU performs its computations independently on a separate device from the CPU. You should think of the GPU invocation less like a function call to the GPU and more as an asynchronous remote procedure call to an independent high-throughput computing device.

The tradoff for having high-throughput is that the GPU is not as flexible as the CPU. For this reason, you write code that runs on the GPU in a language called WebGPU Shading Language (WGSL). WGSL is designed to expose the capabilities and limitations of these simpler computation units as a small domain specific language. It mirrors the limited computations that are expressible on most GPUs. Thus WGSL programs have a shallow mapping to GPU hardware, so the WGSL you write provides fine-grained control of GPU execution.

Furthermore, GPUs they have a limited ability to coordinate locally. GPUs have restricted capabilities for resource sharing and synchronization within local units. Threads are the individual processing units and each runs your WGSL code independently with a unique thread ID. Groups of threads that can share resources through shared memory and synchronize are called workgroups in WebGPU (analogous to blocks in CUDA).

A compute kernel operation specifies the number of threads in a workgroup and the number of total workgroups in a dispatch. Both of these are specified as 3 dimensional vectors with the 3 dimensions named x, y, and z. This 3D spatial organization is a holdover from the graphics origins of the GPU. For compute kernel operations we can map x, y, z dimensions to whatever is useful for our computation. If we don’t need all three dimensions, we can set the unused dimensions to a size of 1.

For those that are familiar with CUDA, here’s a mapping of CUDA terminology to WebGPU (if you are not familiar with CUDA, you can skip this table):

CUDA WebGPU Meaning
Device Code WGSL Code The code executed on the GPU for parallel computation.
Thread Invocation The smallest unit of execution within a block/workgroup.
Block Workgroup A collection of threads/invocations that execute concurrently and share memory. These are both organized as and into a 1D, 2D, or 3D grid.
Grid Dispatch A collection of blocks/workgroups that execute the kernel/WGSL code.These are also organized as a 1D, 2D, or 3D grid.

In WebGPU puzzles, each puzzle has test cases which fix the number of threads in a workgroup and the number of workgroups in a dispatch. Your task is to fill-in the WGSL code that will be executed by each thread in the workgroup in order to carry out the computation specified by the test case and pass the tests validating your solution.

Building WebGPU Puzzles

WebGPU Puzzles has been a fun experiment in combining FastHTML and gpu.cpp to target the browser.

Give gpupuzzles.answer.ai a try! You can join us on Discord and let us know how it goes!

Footnotes

  1. Browser WebGPU support required. Mac/Windows Chrome should work automatically, Safari and Linux users will need to enable WebGPU in their browser settings.↩︎