First Pass: "render" the transparent triangles. The word render is in quotation marks because nothing is actually being drawn to the screen in this step, only stored in a global array to be processed later. I use the keyword discard to discard any writes to the framebuffer.
Initially I made the mistake of backing the global data with a 1D texture. Bad Idea. I found out that my graphics card only supports texture sizes of 16384 which is not enough when potentially millions of fragments need to be stored - and yes even with the simple purple cube. For example, when the front face of the cube covers the entire screen 800x800 screen, you need to store 640,000 fragment. Then add the back face. Now its 1,280,000 fragments. So I switched to a texture buffer instead. These are a nice hybrid between buffer objects and textures where I can still use imageLoad and imageStore, but I have 134,217,728 texels of space.
As an implementation detail, my global data array is composed of uvec4's and packs data accordingly:
globalData.x = prevHeadIndex;
globalData.y = floatBitsToUint(depth);
globalData.z = packUnorm2x16(finalColor.xy);
globalData.w = packUnorm2x16(finalColor.zw);
If anything, prevHeadIndex is the most deserving of its 32 bits because of how large the global data array is (remember, millions.. a short would not cut it)
Second Pass: resolve the transparency data. At this point there is a filled global array and a 2D screen-sized array of linked list heads. The goal here is for each pixel on the screen to get its linked list head, and then backtrack through the global data to form a list of all the fragments that had been rasterized to this pixel, i.e. all the transparent fragments of this pixel. Then once you have a list of fragments that occupy this pixel cell, you sort (I used a bubble sort, but I will try others) from farthest depth to closest depth. Then loop over the sorted array and blend each fragment with the ones before it to produce a final color.
So if the screen is 800x800, there are 640,000 pixels that need to resolve their transparency (well, in the naive approach at least). This is done by rendering two triangles that cover the entire screen. Because they cover the entire screen, one fragment is rasterized per pixel - exactly enough. This is just a means to an end of running the transparency resolver shader program on every pixel.
Yes, the approach is a bit hacky, but it's relatively simple when these triangles are defined straight in clip space coordinates, making the Vertex Shader purely passthrough:
#version 420 core
layout(location = 0) in vec4 position;
gl_Position = position;
Known bugs: the transparent shapes are not blending with the background and always appear above the background. I figure these are pretty simple fixes.