View on GitHub

GPU Raytracer with Optimized Parallel BVHs

An Wu (anwu@andrew.cmu.edu)

Hingon Miu (hmiu@andrew.cmu.edu)

Download this project as a .zip file Download this project as a tar.gz file

Proposal (Project Topic Changed Afterwards)

Checkpoint

Final Writeup

Source Code

> Project Summary

Goals

We aim to implement a state-of-the-art GPU ray tracer with optimized parallel bounding volume hierarchies. Our first goal is to build a ray tracer that runs parallelly on GPU. Our second goal is to construct parallel bounding volume hierarchies on GPU according to Tero Karras’s Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees. Our third goal is to optimize the bounding volume hierarchies according to Tero Karras’s and Timo Aila’s Fast Parallel Construction of High-Quality Bounding Volume Hierarchies.

Challenges



> Plan To Show



> Specifications

Starter Code

The focus of our project is implementing fast parallel BVHs on GPU, so we do not plan to implement a CPU ray tracer from scratch. Hence, we decide to use smallpt as our ray tracer starter code. It is a simple CPU ray tracer that renders spheres. It uses OpenMP to achieve parallelism on CPU.

Hardware

CPU

We use the benchmarks on smallpt website as our CPU runtime reference. Here is the hardware specification on the website: "... different numbers of samples per pixel (spp) on a 2.4 GHz Intel Core 2 Quad CPU using 4 threads ...".

GPU

We use AWS's GPU instance to benchmark our GPU-side code. It uses NVIDIA's GRID K520 GPU. This GPU has 1536 concurrent cores (800 MHz per core), and 4 GB of video memory (We only utilized one GPU, though GRID K520 has two GPUs).



> Results

Graph 1: GPU Ray Tracing vs. CPU Ray Tracing (sample scene contains 9 spheres)


Graph 2: BVHs vs. no BVHs on GPU Ray Tracing (4 samples per pixel)


Graph 2.5: BVHs GPU Construction Time vs. # of Spheres


Graph 3: BVHs vs. Optimized BVHs on GPU Ray Tracing (4 samples per pixel)


Graph 3.5: BVHs GPU Optimization Time vs. # of Spheres