Nov 4, 2024

Dunshire Doom Perf: Part 1

When I announced Dunshire Doom, I made it clear there were gaps in performance:

The renderer is terribly inefficient.

How inefficient? It could handle most DOOM maps at 60fps on my computer but not much more. DOOM 2’s MAP15 would get around 70fps but go to some of the larger Final DOOM maps like Plutonia’s MAP28 or MAP29 and the framerate would drop to 30 or 40fps. Community wads with larger maps were even worse. Sunlust MAP01 was around 30fps, MAP02 was 15-20fps, MAP29 was around 10fps, and MAP30 crawled along at 1fps (or less). The infamous nuts.wad was around 2-3fps. My browser tab would give up and crash on even larger maps like Profane Promiseland or Cosmosgeneis. These numbers are captured with the enemy AI turned off - it’s just poor rendering performance.

I made several attempts to fix it:

Adding and removing geometry on the fly was too slow. I tested it by randomly selecting walls/floors to hide and show and my framerates would stutter along as the browser created and destroyed those objects.
Instead of adding and removing, I would add everything then toggle visibility. This was better, but it wasn’t enough and initializing the map was still slow and used lots of memory.
I tried instance geometry but I wasn’t sure how to apply textures to instanced geometry. Even if I could get segments of a texture, how would things like scrolling textures work?

I figured I must be doing something wrong but I wasn’t sure what so I took a break and let the project rest. I moved on to other things. Every now and then I would do some googling on the topic but I was no closer to solving the puzzle.

Glimmer of Hope

I’m not exactly sure how I got there, but somehow after 6 months a thought occured to me: my browser can render 2M triangles at 120fps in WebGL demos so why can’t I handle 30K-40K from DOOM maps? I had read that reducing draw calls would help but to get there, I would need a texture atlas. Perhaps I knew this solution months ago but felt it would be too much work. Whatever the reason, I wanted to give it a try so I built a little prototype that:

put all DOOM graphics (walls and floors) into a single texture
wrote a shader to read sections of that texture and apply it to walls
added thousands and eventually hundreds of thousands of walls to see how it would perform.
added a thread to periodically move those walls around to see how performance handles during updates

How did it perform? Amazingly well. Even with 400,000 walls:

DOOM textures applied to 400,000 individual walls

With the success of the prototype, I was motivated to get this working. It took a few days of work to translate the existing threlte Wall/Flat components into something that created a single geometry and updated vertex and texture attributes when the walls moved (like a platform raising or door opening). It also took a few days to figure out GLSL shaders and how ThreeJS shaders are built so I could load a sections of a texture and tile them or scroll them as needed. Here is the code that fetches the section of a texture atlas:

// vertex shader:
float invAtlasWidth = 1.0 / float(tAtlasWidth);
vec2 atlasUV = vec2(
        mod(float(texN), tAtlasWidth),
        floor(float(texN) * invAtlasWidth));
atlasUV = (atlasUV + .5) * invAtlasWidth;
vUV = texture2D( tAtlas, atlasUV );

vDim = vec2( vUV.z - vUV.x, vUV.w - vUV.y );
...
// fragment shader:
vec2 mapUV = mod( vMapUv * vDim, vDim) + vUV.xy;
vec4 sampledDiffuseColor = texture2D( map, mapUV );

How does the above code work? It relies on two things: having a single texture map that contains all texture for the DOOM map and a second texture with the coordinates of each texture ()tAtlas). To fetch a particular texture (texN), we sample tAtlas and that gives us vDim (the size and dimensions) of texN in map. Lastly, we read mod by vDim so we don’t scroll into the next texture and voilà! We can extract individual textures from a map and apply them to walls and floors and ceilings.

In the end, Dunshire Doom now renders the whole map as a single mesh with a single texture and therefore 1 draw call. It’s a little mind boggling for me that I can render 30K triangles at 30fps or 5.6M triangles at 120fps just by changing my approach.

Caveat for benchmarks: this is an incomplete change. While the DOOM map is rendered as a single mesh, each monster is still a separate draw calls and for maps with lots of monsters, that makes a big difference. For these comparisons between the old rendere (R1) and the new (R2), I’ve turned off monsters. I’ll revisit this later as future work.

Okay, here are the numbers so far:

Map	R1 Average FPS	R1 Draw Calls	R2 Average FPS	R2 Draw calls
Sunlust MAP01	35fps	5794	110fps	6
Sunlust MAP02	20fps	9008	104fps	6
Sunlust MAP29	7fps	7865	100fps	6
Sunlust MAP30	3fps	38,542	104fps	6
nuts MAP01	120fps	181	120fps	6
Cosmogensis MAP05	5fps	13,200	113fps	6
Profane Promiseland MAP01	Crash!	Crash!	120fps	6

You can try it yourself by loading your favourite wad into Dunshire Doom and toggling the renderer from R1 to R2. NOTE for v0.8: don’t forget to check “No items/monsters” before loading a large map!

Performance Details

Obviously the reduction in draw calls is having a huge impact here but that’s not the whole story. With this change, the renderer has moved away from each wall being a svelte component that subscribes to floor/ceiling height, texture, and light changes. This makes a huge difference. Instead of 10s of walls subscribing to changes in the room floor height, for the rare event the floor moves, we now only have 1 subscription that updates 10 walls. Across the whole map, that could be thousands or tens of thousands fewer subscriptions. In fact, I think we can get even more performance improvement with a kind of map-changed event because most of a DOOM map is static. We don’t need subscribers listening for events that very rarely happen. I’ll tackle this in the future.

Another improvement comes from lighting. In R1, each floor, ceiling, wall, and monster in a room subscribes to the light level of the room. There is also a global extra light that overrides the light level and is used when the player fires their weapon or picks up the light visor powerup. While R1 was easy to write, it didn’t perform well. When the player fired their weapon it means that every floor, ceiling, wall, and monster in the map would have to update their own light level (ouch)! In R2, we pass the extra light to the shader which means there is only 1 subscription (instead of thousands). Further, because floors, ceilings, and walls all share the light value from the room, we create a texture where each pixel represents the light level of one room and when we render a wall, floor, or ceiling, we read the value from that texture.

Lightmap texture from DOOM’s E1M1. Notice the blinking and pulsing pixels for rooms where the light blinks or pulses. Sorry it’s blurry, the texture is only 16x16 so it it is scaled.

DOOM E1M1 rendered without textures to show the light map being applied. Notice the blinking and pulsing lights.

I was able to get other wins by moving more computation into the shader. For example, the computation for fake contrast in R1 was in the wall component:

$: fakeContrastValue =
    $fakeContrast === 'classic' ? (
        linedef.v[1].x === linedef.v[0].x ? 16 :
        linedef.v[1].y === linedef.v[0].y ? -16 :
        0
    ) :
    $fakeContrast === 'gradual' ? Math.cos(angle * 2 + Math.PI) * 16 :
    0;

And now the code has moved to the shader (branch-free! although branching may be better in this case):

const float fakeContrastStep = 16.0 / 256.0;
float fakeContrast(vec3 normal) {
    vec3 absNormal = abs(normal);
    float dfc = float(doomFakeContrast);
    float gradual = step(2.0, dfc);
    float classic = step(1.0, dfc) * (1.0 - gradual);
    return (
        (classic * (
            step(1.0, absNormal.y) * -fakeContrastStep +
            step(1.0, absNormal.x) * fakeContrastStep
        )) +
        (gradual * (
            (smoothstep(0.0, 1.0, absNormal.y) * -fakeContrastStep) +
            (smoothstep(0.0, 1.0, absNormal.x) * fakeContrastStep)
        ))
    );
}

I’m not sure how to measure the performance of shaders so I’m not sure how to assess the cost or benefit of this. It feels like the right direction though because fake contrast is purely a rendering concern and GPUs are good at that. We want to free up the CPU and JS thread for other work when possible.

I’ll miss svelte components and stores though. This project was a chance to play with svelte and Threlte. I could have perhaps structured the data to make better use of stores to reduce subscribers but still I wonder if this project is a case that shows those solutions weren’t the right fit. Threlte’s mission is to “Rapidly build interactive 3D apps for the web.” and while that is true, I could and did build this app quickly, the performance wouldn’t scale. To really perform well, I needed to move away from property change events and thousands of components because each component brings overhead and, for large maps, the cost was too high. ThreeJS has always felt a little daunting to me but Threlte was approachable and fun and helped me understand how ThreeJS works. Threlte is fantastic and I would recommend it to anyone who isn’t already familiar with ThreeJS. Perhaps ThreeJS is a stepping stone for me to dust off my OpenGL knowledge. I’ll have more detailed thoughts on this at a later time after completing the future work.

Map Load Time

As I started playing with larger maps (like Sunder) I quickly grew impatient because it could take 25-35s to load the map. With browser profiling tools and a few console.time() messages it was not hard to spot the places to fix.

When visiting sectors to render, we would filter linedefs on each loop iteration. Sorting lindefs by sector before the loop saved 5-8s of time during map load.
Texture animations were stored in a list and we were searching that list pretty frequently. Switching to a map saved almost 2s of load time.
Stop using svelte components for walls/floors/ceilings of maps. Not only does this reduce draw calls as discussed above, it also seems about 10x faster (2500ms to 250ms) and the browser appears to use 50% less memory.

It still takes way to long to load a map. Cosmogenesis MAP05, for example, still takes about 15s to load but 11s of that is spent figuring out implicit vertices for subsectors. I don’t have a good idea on how to optimize that code. I’ve been experimenting with community DOOM maps and I’ve learned that many seem to have incomplete BSP nodes. I’ve also learned that many DOOM ports simply regenerate the BSP on load using zdbsp so perhaps I’ll end up with something similar.

Bonus: Lights!

With the extra performance, I now had a chance to play with some more advance features like lights.

With a more powerful rendering approach, we can do more things like adding lights and shadows

I think there is a neat opportunity for a set of maps designed around an orthographic camera which are dark and moody and take advantage of lighting and shadowing. Hard shadows look pretty cool in classic DOOM!

As the player moves, the light follows and casts shadows on monsters and through transparent walls.

Future Work

I’m excited that map geometry renders much faster but it’s not enough. Now that I’ve got a little taste of optimization, I’d like to try and get large maps playable (at least on powerful computers). To get there, I’ll need to:

Use instance geometry for monsters. Now that I’ve built a texture atlas, sprites should be similar although there are additional complexities: geometry size based on texture size, rotations, interpolation of movement, animation, “full brite” states. It’ll be more shader work than maps but I think it’s doable.
Remove svelte stores. The core game logic is full of svelte/store. I love svelte and stores are pretty cool and may get even better with Svelte 5 but it’s probably not efficient enough. Why subscribe for texture or lighting changes when 90% of walls and rooms won’t ever change? We can be much more efficient with a onMapChanged event.
Now that I’ve got a little experience with shaders, I think I could move scrolling texture logic into the shader. The benefit is that the JS thread isn’t occupied updating some variables and copying data to the GPU. Instead, all we do is copy the game time and let the shader handle the scrolling.
The current texture atlas is pretty inefficient. It creates a giant texture that is mostly empty and doesn’t fit content very well. To reduce memory usage, especially for mobile, we should do better.
Map sections. It doesn’t seem expensive to have one geometry for the whole map but perhaps it would be more efficient to cut the map into sections and only render the visible sections.

Of course, I’d also like to run zdbsp (or equivalent) instead of the subsector vertex stuff I’m doing now but that’s maybe a future future work.