Monday, April 23, 2012

Mass Instancing + OIT

A couple weeks ago I set out to do a project where a scene of millions of objects could be rendered fast using OpenGL and OpenCL performing frustum culling. This is a screenshot of a fraction of the world: only 169,112 out of 20,000,000 instances are drawn.  




Tonight I decided to expand this project to include Order Independent Transparency. The program is the same as before, but every object now has an alpha of 0.5. Click the image to see it in full detail.


As I expected, the frame rate went WAY down. Before transparency I had roughly 118 fps, and now with transparency it's at 33 fps. Though I may not ever achieve 60 fps, I believe that it is possible and I will try to do it.

(Edit - added a better picture)

Sunday, April 22, 2012

Long time, no blog

I've been a combination of very busy and very lazy lately, but now I'm back in work mode with some new results:


Finally something a little more interesting than the cube. This alien model is 14,192 triangles and the frame rate is still well above 60 fps (hovers around 650, though I shouldn't fool myself yet, there are lots of improvements I can still make).

I have fixed some of the artifacts in my previous posts. The depth issue where the transparent stuff is always above the rest of the scene is gone now that I render the opaque scene before I render the transparent objects. Along with this, I use a very handy GLSL command layout(early_fragment_tests) in; to discard transparent fragments that fail the depth test before they ever reach the fragment shader. This ensures that transparent fragments that are behind opaque fragments are never processed. Finally I disallowed the transparent pass to write to the depth buffer because otherwise transparent fragments would fail the depth test against other transparent fragments.

There are still a couple of things I would like to do in the next 2 days. First, I still have not coded the linearized approach to OIT (I'm doing the linked list version). Second, I want to put transparency into a mass-instancing program that I did a couple weeks ago. This is where OIT would really shine.

Monday, April 9, 2012

Fixed Blending

After a bit of a hiatus, I'm back to working on transparency.

The first thing I did was fix the blending issues I was having. You can see the difference below:


No blending:



 Blending:

Before resolving the transparency with the full-screen quad, I enable blending. My blending function and equation are:


glBlendEquation(GL_FUNC_ADD);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);

This achieves the effect where final color = alpha*source + (1-alpha)*destination

Monday, March 26, 2012

Details of my first result post

So I think it's time to explain the process that went into below image:

First Pass: "render" the transparent triangles. The word render is in quotation marks because nothing is actually being drawn to the screen in this step, only stored in a global array to be processed later. I use the keyword discard to discard any writes to the framebuffer.

Initially I made the mistake of backing the global data with a 1D texture. Bad Idea. I found out that my graphics card only supports texture sizes of 16384 which is not enough when potentially millions of fragments need to be stored - and yes even with the simple purple cube. For example, when the front face of the cube covers the entire screen 800x800 screen, you need to store 640,000 fragment. Then add the back face. Now its 1,280,000 fragments. So I switched to a texture buffer instead. These are a nice hybrid between buffer objects and textures where I can still use imageLoad and imageStore, but I have 134,217,728 texels of space.

As an implementation detail, my global data array is composed of uvec4's and packs data accordingly:


uvec4 globalData;
globalData.x = prevHeadIndex;
globalData.y = floatBitsToUint(depth);
globalData.z = packUnorm2x16(finalColor.xy);
globalData.w = packUnorm2x16(finalColor.zw);

If anything, prevHeadIndex is the most deserving of its 32 bits because of how large the global data array is (remember, millions.. a short would not cut it)



Second Pass: resolve the transparency data. At this point there is a filled global array and a 2D screen-sized array of linked list heads. The goal here is for each pixel on the screen to get its linked list head, and then backtrack through the global data to form a list of all the fragments that had been rasterized to this pixel, i.e. all the transparent fragments of this pixel. Then once you have a list of fragments that occupy this pixel cell, you sort (I used a bubble sort, but I will try others) from farthest depth to closest depth. Then loop over the sorted array and blend each fragment with the ones before it to produce a final color.

So if the screen is 800x800, there are 640,000 pixels that need to resolve their transparency (well, in the naive approach at least). This is done by rendering two triangles that cover the entire screen. Because they cover the entire screen, one fragment is rasterized per pixel - exactly enough. This is just a means to an end of running the transparency resolver shader program on every pixel.

Yes, the approach is a bit hacky, but it's relatively simple when these triangles are defined straight in clip space coordinates, making the Vertex Shader purely passthrough:


#version 420 core
layout(location = 0) in vec4 position;
void main()
{
gl_Position = position;
}
Known bugs: the transparent shapes are not blending with the background and always appear above the background. I figure these are pretty simple fixes.


First Results!



It's 7:39 in the morning... and no, I did not wake up early. But I finally have results for order independent transparency, even if its just a simple purple cube.

I will sum up what I did once I am less tired.