### Optimizing GPU memory consumption

Hi,

A few weeks ago I decided to compute how much GPU memory the terrain was using. It turned out to be a pretty huge amount. I tried to deal with this issue for the past 4 days, here is what I've come to.

First things first, let's compute how much memory the terrain was using. I'm using geometry shaders to generate, well, terrain's geometry: vertices and vertex indices. Vertices include position and normal, that is 2 * 3 float vectors: 24 bytes per vertex. Indices are simple integers 4 bytes each. With this tetrahedral decomposition, there are 19 edges, so up to 19 vertices can be generated per cell. With vertex indices, we can share vertices from adjacent cells, so we only need to generate a maximum of 7 vertices per cell. Each tetrahedron can generate up to 2 triangles, that is 6 indices. There are 6 tetrahedra per cell, so 36 indices can be generated per cell. Each cell can occupy a maximum of 24 * 7 + 4 * 36 = 312 bytes. Let's consider a single level of detail of 105^3 cells (pretty decent quality). We have 1157625 (1M) cells, for a total of 312 * 1M = 344Mb for each LOD! Honestly I'm still wondering how I've been able to host up to 4 levels of detail at this resolution on my 512Mb card, the driver must either be pretty smart or just allocate some space on the CPU memory (which would explain the huge performance drop when using more than 2 LOD).

Let's have a closer look at what consumes memory. These 344Mb are split into 185Mb for vertices and 158Mb for indices. Each level is actually split up in regions that are usually 16 cells wide. If we store the vertices position relatively to the region position, we need 4 bits to know the cell offset plus some bits for the position within the cell. 4 bits would give us 16 degrees of freedom and would scale up to a single byte. We therefore reduced the position vector from 12 bytes down to 3. Since normals are computed from gradients whose components are the difference between two single byte values, the same compression scheme applies. A single vertex now only weighs 8 bytes (2 * (3 + 1) for alignment purposes) instead of 24. Ultimately the normals could be computed on-the-fly at render time, taking the vertex size down to 4 bytes (the 8 extra bits can be used further to store random stuff). Vertices being six times smaller, they only use 31Mb instead of 185Mb. Indices are difficult to compress, we can't afford any loss of "precision" as we did with the vertices. Considering that there is a maximum of 7 * 16^3 = 28672 vertices per region, we could store each index on a 2 bytes integer. However due to hardware limitations it doesn't seem much feasible. The OpenGL transform feedback feature doesn't (yet?) allow to output data smaller than 4 bytes. Since each vertex is only 4 bytes we could remove the use of indices and save 31Mb, but I actually didn't try that. Instead I thought of using marching cubes, which generate far less triangles, and thus less vertices and indices.

With marching cubes each cell can generate up to 3 vertices and 15 indices, which brings the total LOD size down to 80Mb. Not bad, but that's still 640Mb for 8 LOD (16km^3 of visible terrain). I haven't done anything else so far but I've got some ideas. First, we must keep in mind that we only considered the worst cases. In practice we will never generate as much geometry, first because it is actually impossible; there can't be adjacent worst-case cells (at least for marching tetrahedra), second because if the user is crazy enough to sculpt scattered geometry we can simply restrict his freedom. For now I've not managed to reach even 20% of the maximum memory consumption with either marching tetrahedra and marching cubes, so a simple way to reduce memory usage would be just to cut off the vertices and indices buffer size, and pray.

Below some screenshots showing the differences (almost unnoticeable) between marching cubes and marching tetrahedra, with and without compression.