While porting the ocean code I also noticed that the FFT transforms could be done in a compute shader which should be a lot faster than the pixel shader approach that Brunetton used in his code and which I also used in the XNA version of the engine. By googling around, I stumbled upon the NVIDIA code provided in their FFT ocean demo from the NVIDIA SDK 11 which is a 2D radix-8 FFT algorithm. That means, it can only transform 2D maps that have both width and height as powers of 8, for example: 64x64, 512x512, 8096x8096 etc. The problem was that I was using a 256x256 wave spectrum which could not be transformed with the NVIDIA code. So, I had the option to either move to a 512x512 spectrum or use a radix-4 or radix-2 FFT transform. I searched the web for a compute shader implementation of a radix-2 or radix-4 transform but couldn't find anything. In conclusion, if I wanted to stick to the 256x256 spectrum, I had to write my own FFT code and I was in no mood of doing that. I tried that once and it gave me many days of headaches in which I managed to write a 1D radix-2 FFT but it was not easy. The complexity of FFT transform algorithms grows exponentially when you go from one dimension to 2 dimensions so I decided to move to a 512x512 map and use the NVIDIA code. I figured that if it would prove to be to slow, I would move to a 256x256 map later. There was actually also another option. I noticed there is a new interface in the DX11 SDK called ID3DX11FFT. However, it seems that it can only transform one spectrum at a time and I have 6 of them. This means I would need to issue 6 transform commands whereas the NVIDIA FFT code can be modified easily to transform all 6 of them in one step. The NVIDIA FFT has also the advantage of using a radix-8 algorithm which means it only needs to issue 6 512x512 Dispatch calls for a 512x512 spectrum whereas a radix-2 FFT (like the one Brunetton used and which I suspect, the ID3DX11FFT interface also uses) would require 8 Dispatch calls of the same size for a 512x512 spectrum. I could also be wrong and the DX11 interface could be smarter than that and use a different radix algorithm for different spectrum sizes but I couldn't find anything on the web that describes how it works internally. It also appears that no one ever used it and that's just weird.
Bottom line is, my new version uses a 512x512 spectrum transformed with a radix-8 FFT compute shader instead of a 256x256 spectrum transformed with a radix-2 pixel shader code and the new one is a lot faster. For the future, it would be interesting to experiment a bit with the DX11 FFT interface to see if it computes a 256x256 FFT transform faster than the NVIDIA code computes a 512x512 transform. I don' really need a 512x512 map, the gain in visual quality is negligible so I would prefer a 256x256 transform even if it's only 10% faster. I would also like to write my own radix-4 FFT code one day just for the sake of it and to prove to myself that I can do it . On the other hand I fear I might waste too much valuable time doing it.
don't know if you guys bothered reading his blog on how he's done the ocean... pasted it over here as a quick reference for SE coders...
I wonder how difficult it'd be to impliment the water from the Direct3D video in OpenGL? O_o It isn't something I've any experience with, but I would think it would take a ton of coding. Yes? No?
Promised water video. Some artifacts under water is due to interaction with atmosphere. FPS during video capture dropped to 3-5, so my camera piloting is not smooth:)