

I have verified that the compute shader version returns the right average, and the pixel shader version starts to introduce errors when downscaling from an odd height or odd width ( for eg, downscaling from 5*3, 5*2, or 4*3). It'll become the GPU equivalent of our FunctionLibrary class, so name it FunctionLibrary as well. Compute shaders give us the option to do shading-like operations, but we should not expect to be able to reliably exceed the performance of fragment shaders (even if, occasionally, we spectacularly can) because our implementations are not based on insider knowledge of how to optimize for the particular GPU-even when there is a single If i use a resolution that is not the power of 2, I end up with an incorrect average (when compared to the compute shader). Though I believe AMD and nvidia cards handle caching differently. One general compute shader works in all cases and performs one task without synchronization for each output texel and uses (= 256 threads/group which is a multiple of NVidia's warp size of 32 and AMD's wavefront size of 64).» Check for support using caps bit: ComputeShaders_Plus_RawAndStructuredBuffers_Via_ Shader_4_x » Adds: New Compute Shader inputs: vThreadID, vThreadIDInGroup, Posted: (4 days ago) I dispatch a thread per cell I want to update and that thread samples the SSBO 8 times to gather the cell's neighbors. NVIDIA GPUS, such as those from our Pascal generation, are composed of different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. local memory size, whether it writes global memory). In this post we’ll be looking at compute shaders, profiling tools and failed experiments to improve performance. Post If i use a resolution that is not the power of 2, I end up with an incorrect average (when compared to the compute shader).
