GPU Programming from Scratch

My experience learning GPU programming, and implementing a new GPU education app in the process

Author

Sarah Pan

Published

March 17, 2025

Jeremy Howard says: I’m really excited to introduce you all to Sarah Pan, an extraordinary and inspiring AI researcher who began working with Answer.AI whilst still at high school (and she had a first-author paper accepted at NeurIPS too)!

Sarah’s first project with us is WebGPU Puzzles, which is the best way I know of to get started with GPU programming fundamentals today. With it, you can begin learning GPU programming right in your browser. I was astonished at how Sarah was able to learn, from scratch, GPU programming, WebGPU, and gpu.cpp in a matter of weeks, to a level where she could pull this off.

I’ve asked Sarah to share a bit about her story, which she has done in the post below. She was also kind enough to spend some time doing an interview with me, which I’m sure you’ll agree is a fascinating insight into the life of an very special person.

Hey! My name is Sarah Pan and you might’ve seen my name attached to the WebGPU Puzzles project (based on Answer.AI’s gpu.cpp). A little about me: I’m a research fellow at Answer.AI as well as a first-year student at MIT! This means that outside of classes and all the other fun chaos of MIT, I work with the Answer.AI team on various projects, as well as on my own research.

The Origin Story

You might be wondering how I got here. (Sometimes, I do too.) But my AI journey began towards the end of middle school when my older brother introduced me to fast.ai. At the time, having R2D2 as my favorite Star Wars character was enough to propel me into taking the course.

Practical Deep Learning took a top-down approach to teaching about neural networks. This meant that the important high-level ideas weren’t gatekept by the nitty-gritty. Being able to understand the inner workings of complex systems without having taken a math class past Algebra I, and much less having a college degree, was very refreshing.

Fast forward to junior year of high school—-I had a few more AI experiences under my belt and was ready for more. I joined MIT Primes, a research program that connects high schoolers to researchers in mathematics, computer science, and computational biology. There, my mentor, Vlad Lialin showed me the ropes to everything from effectively reading academic papers to adopting the “iterate fast” ethos.

Together, we worked on the project that would become my first publication. I don’t want to bore you with the details, but we essentially used a process reward model ¹ in RL to improve the reasoning abilities of LLMs.

Though this sounded pretty straightforward at the start, I was quickly proven wrong. There were many moments where learning auxiliary skills were essential to implementing the ideas I really cared about. If anything, a summer of trying to fit billion-parameter LLMs onto dual 3090s taught me about the importance of good engineering habits. But soon enough, October rolled around and my fingers were crossed for a NeurIPS paper.

NeurIPS

I don’t really know of any other way to describe the experience but surreal. The poster halls were huge and, almost out of nowhere, there were so many people with the same interests as me. All those ideas I saw on Twitter and read about on various blogs materialized in front of me.

I remember bumping into Jeremy entirely out of chance², and we stayed in touch after the conference. Little did I know, those minute engineering problems I encountered over the summer would resurface in conversations with him and the people who would become my mentors and collaborators at Answer.AI.

As of late

Last summer, I collaborated with Austin Huang on creating WebGPU Puzzles. And fun fact, that was my second encounter with GPU programming, so I was a little intimidated going into it. I had a general understanding of what CUDA was and had stumbled upon Sasha Rush’s GPU Puzzles at some point, too. But soon enough I realized that the ideas those experiences taught me would be pretty useful.

One thing I appreciated about Sasha’s puzzles was that my main focus was on solving the puzzles themselves. For one, they were hosted in a Google Colab notebook, which has a beginner-friendly interface. And when it came to syntax, CUDA puzzles used Numba, which doesn’t require much knowledge beyond Python and NumPy. The accessibility and user-friendliness of these puzzles took away the unnecessary complexities and reduced parallel computing into a suite of largely unobstructed principles. That way, instead of worrying about all things C++, I could focus on something more akin to a coding challenge.

I wanted to replicate this for those that wanted to test out WebGPU/gpu.cpp, or even those just ``breaking into’’ GPU programming. From there, I set out on developing a WebGPU version of Sasha’s CUDA puzzles with a detailed set of solutions for ultimate beginner-friendliness. Since then, I’ve returned to my research roots–I’m currently working on a reward model project³.

Beyond research, I’m a first year at MIT studying math and computer science. My favorite class thus far is probably discrete math (it’s very well taught!) but regret not signing up for more math classes.⁴ Outside of school, I love watching the sun rise while rowing on the Charles River, reading AI Twitter, and Facetiming my dog.

Footnotes

A process reward model (PRM) provides feedback at each step of a reasoning process, unlike outcome reward models (ORMs) which evaluate the entire response, offering more granular and structured guidance for improving complex tasks.↩︎
Ultimate full circle moment for me!↩︎
preprint soon!↩︎
Have to knock out those general insitute requirements↩︎