Dunshire Doom Perf: Part 1


When I announced Dunshire Doom, I made it clear there were gaps in performance:

The renderer is terribly inefficient.

How inefficient? It could handle most DOOM maps at 60fps on my computer but not much more. DOOM 2’s MAP15 would get around 70fps but go to some of the larger Final DOOM maps like Plutonia’s MAP28 or MAP29 and the framerate would drop to 30 or 40fps. Community wads with larger maps were even worse. Sunlust MAP01 was around 30fps, MAP02 was 15-20fps, MAP29 was around 10fps, and MAP30 crawled along at 1fps (or less). The infamous nuts.wad was around 2-3fps. My browser tab would give up and crash on even larger maps like Profane Promiseland or Cosmosgeneis. These numbers are captured with the enemy AI turned off - it’s just poor rendering performance.

I made several attempts to fix it:

I figured I must be doing something wrong but I wasn’t sure what so I took a break and let the project rest. I moved on to other things. Every now and then I would do some googling on the topic but I was no closer to solving the puzzle.

Glimmer of Hope

I’m not exactly sure how I got there, but somehow after 6 months a thought occured to me: my browser can render 2M triangles at 120fps in WebGL demos so why can’t I handle 30K-40K from DOOM maps? I had read that reducing draw calls would help but to get there, I would need a texture atlas. Perhaps I knew this solution months ago but felt it would be too much work. Whatever the reason, I wanted to give it a try so I built a little prototype that:

How did it perform? Amazingly well. Even with 400,000 walls:

DOOM textures applied to 400,000 individual walls

With the success of the prototype, I was motivated to get this working. It took a few days of work to translate the existing threlte Wall/Flat components into something that created a single geometry and updated vertex and texture attributes when the walls moved (like a platform raising or door opening). It also took a few days to figure out GLSL shaders and how ThreeJS shaders are built so I could load a sections of a texture and tile them or scroll them as needed. Here is the code that fetches the section of a texture atlas:

// vertex shader:
float invAtlasWidth = 1.0 / float(tAtlasWidth);
vec2 atlasUV = vec2(
        mod(float(texN), tAtlasWidth),
        floor(float(texN) * invAtlasWidth));
atlasUV = (atlasUV + .5) * invAtlasWidth;
vUV = texture2D( tAtlas, atlasUV );

vDim = vec2( vUV.z - vUV.x, vUV.w - vUV.y );
...
// fragment shader:
vec2 mapUV = mod( vMapUv * vDim, vDim) + vUV.xy;
vec4 sampledDiffuseColor = texture2D( map, mapUV );

How does the above code work? It relies on two things: having a single texture map that contains all texture for the DOOM map and a second texture with the coordinates of each texture ()tAtlas). To fetch a particular texture (texN), we sample tAtlas and that gives us vDim (the size and dimensions) of texN in map. Lastly, we read mod by vDim so we don’t scroll into the next texture and voilà! We can extract individual textures from a map and apply them to walls and floors and ceilings.

In the end, Dunshire Doom now renders the whole map as a single mesh with a single texture and therefore 1 draw call. It’s a little mind boggling for me that I can render 30K triangles at 30fps or 5.6M triangles at 120fps just by changing my approach.

Caveat for benchmarks: this is an incomplete change. While the DOOM map is rendered as a single mesh, each monster is still a separate draw calls and for maps with lots of monsters, that makes a big difference. For these comparisons between the old rendere (R1) and the new (R2), I’ve turned off monsters. I’ll revisit this later as future work.

Okay, here are the numbers so far:

MapR1 Average FPSR1 Draw CallsR2 Average FPSR2 Draw calls
Sunlust MAP0135fps5794110fps6
Sunlust MAP0220fps9008104fps6
Sunlust MAP297fps7865100fps6
Sunlust MAP303fps38,542104fps6
nuts MAP01120fps181120fps6
Cosmogensis MAP055fps13,200113fps6
Profane Promiseland MAP01Crash!Crash!120fps6

You can try it yourself by loading your favourite wad into Dunshire Doom and toggling the renderer from R1 to R2. NOTE for v0.8: don’t forget to check “No items/monsters” before loading a large map!

Performance Details

Obviously the reduction in draw calls is having a huge impact here but that’s not the whole story. With this change, the renderer has moved away from each wall being a svelte component that subscribes to floor/ceiling height, texture, and light changes. This makes a huge difference. Instead of 10s of walls subscribing to changes in the room floor height, for the rare event the floor moves, we now only have 1 subscription that updates 10 walls. Across the whole map, that could be thousands or tens of thousands fewer subscriptions. In fact, I think we can get even more performance improvement with a kind of map-changed event because most of a DOOM map is static. We don’t need subscribers listening for events that very rarely happen. I’ll tackle this in the future.

Another improvement comes from lighting. In R1, each floor, ceiling, wall, and monster in a room subscribes to the light level of the room. There is also a global extra light that overrides the light level and is used when the player fires their weapon or picks up the light visor powerup. While R1 was easy to write, it didn’t perform well. When the player fired their weapon it means that every floor, ceiling, wall, and monster in the map would have to update their own light level (ouch)! In R2, we pass the extra light to the shader which means there is only 1 subscription (instead of thousands). Further, because floors, ceilings, and walls all share the light value from the room, we create a texture where each pixel represents the light level of one room and when we render a wall, floor, or ceiling, we read the value from that texture.

Lightmap texture from DOOM’s E1M1. Notice the blinking and pulsing pixels for rooms where the light blinks or pulses. Sorry it’s blurry, the texture is only 16x16 so it it is scaled.
DOOM E1M1 rendered without textures to show the light map being applied. Notice the blinking and pulsing lights.

I was able to get other wins by moving more computation into the shader. For example, the computation for fake contrast in R1 was in the wall component:

$: fakeContrastValue =
    $fakeContrast === 'classic' ? (
        linedef.v[1].x === linedef.v[0].x ? 16 :
        linedef.v[1].y === linedef.v[0].y ? -16 :
        0
    ) :
    $fakeContrast === 'gradual' ? Math.cos(angle * 2 + Math.PI) * 16 :
    0;

And now the code has moved to the shader (branch-free! although branching may be better in this case):

const float fakeContrastStep = 16.0 / 256.0;
float fakeContrast(vec3 normal) {
    vec3 absNormal = abs(normal);
    float dfc = float(doomFakeContrast);
    float gradual = step(2.0, dfc);
    float classic = step(1.0, dfc) * (1.0 - gradual);
    return (
        (classic * (
            step(1.0, absNormal.y) * -fakeContrastStep +
            step(1.0, absNormal.x) * fakeContrastStep
        )) +
        (gradual * (
            (smoothstep(0.0, 1.0, absNormal.y) * -fakeContrastStep) +
            (smoothstep(0.0, 1.0, absNormal.x) * fakeContrastStep)
        ))
    );
}

I’m not sure how to measure the performance of shaders so I’m not sure how to assess the cost or benefit of this. It feels like the right direction though because fake contrast is purely a rendering concern and GPUs are good at that. We want to free up the CPU and JS thread for other work when possible.

I’ll miss svelte components and stores though. This project was a chance to play with svelte and Threlte. I could have perhaps structured the data to make better use of stores to reduce subscribers but still I wonder if this project is a case that shows those solutions weren’t the right fit. Threlte’s mission is to “Rapidly build interactive 3D apps for the web.” and while that is true, I could and did build this app quickly, the performance wouldn’t scale. To really perform well, I needed to move away from property change events and thousands of components because each component brings overhead and, for large maps, the cost was too high. ThreeJS has always felt a little daunting to me but Threlte was approachable and fun and helped me understand how ThreeJS works. Threlte is fantastic and I would recommend it to anyone who isn’t already familiar with ThreeJS. Perhaps ThreeJS is a stepping stone for me to dust off my OpenGL knowledge. I’ll have more detailed thoughts on this at a later time after completing the future work.

Map Load Time

As I started playing with larger maps (like Sunder) I quickly grew impatient because it could take 25-35s to load the map. With browser profiling tools and a few console.time() messages it was not hard to spot the places to fix.

It still takes way to long to load a map. Cosmogenesis MAP05, for example, still takes about 15s to load but 11s of that is spent figuring out implicit vertices for subsectors. I don’t have a good idea on how to optimize that code. I’ve been experimenting with community DOOM maps and I’ve learned that many seem to have incomplete BSP nodes. I’ve also learned that many DOOM ports simply regenerate the BSP on load using zdbsp so perhaps I’ll end up with something similar.

Bonus: Lights!

With the extra performance, I now had a chance to play with some more advance features like lights.

With a more powerful rendering approach, we can do more things like adding lights and shadows

I think there is a neat opportunity for a set of maps designed around an orthographic camera which are dark and moody and take advantage of lighting and shadowing. Hard shadows look pretty cool in classic DOOM!

As the player moves, the light follows and casts shadows on monsters and through transparent walls.

Future Work

I’m excited that map geometry renders much faster but it’s not enough. Now that I’ve got a little taste of optimization, I’d like to try and get large maps playable (at least on powerful computers). To get there, I’ll need to:

Of course, I’d also like to run zdbsp (or equivalent) instead of the subsector vertex stuff I’m doing now but that’s maybe a future future work.