- Home /
How to prevent shader optimizations?
I was testing shader instruction performance, and to make results more clear i made a loop, and unrolled it.
[unroll] for(int j = 0; j < TEST_COUNT; j++)
{
col = tex2D(_MainTex, i.uv.xy);
}
But compiler optimized it and got rid of the whole loop, here ASM:
8: mov o1.xyzw, v1.xyxy
9: sample_l o2.xyzw, v1.xyxx, t0.xyzw, s0, v1.y
10: mov o3.xyzw, l(0.500000,0.500000,0.500000,0.500000)
So i did this:
[unroll] for(int j = 0; j < TEST_COUNT; j++)
{
[isolate]
{
col = tex2D(_MainTex, i.uv.xy);
}
}
And compiler said: "unknown attribute isolate, or attribute invalid for this statement" Even if i add a little value to uv it doesn't helps
Answer by Bunny83 · Feb 23, 2018 at 01:46 PM
Well, your code is pretty pointless in the loop as it doesn't rely on the loop variable and it doesn't apply any additive changes to a variable. So it's obvious that this gets removed completely. You may want to have some code that actually need to be unrolled.
isolate seems to be HLSL only and only applies to XBox 360
What exactly do you want to achieve?
I want to test shader instruction performance. That's the whole point. As i said even if i add a little value to uv it doesn't helps, compiler overoptimize it anyway. I don't want to add to much code between tested instructions, to keep results clear.
You can't test the shader performance like this. GPUs run highly in parallel. The GPU sets up an instruction pipeline. As long as all required instructions fit into the pipeline it doesn't really affect the performance, just the latency. The performance would depend on the number of input data. You can't use the same logic as you would on a CPU. Things do not happen in a linear / sequential fashion and you can not simply add up the numbers to estimate how a shader will perform. It highly depends on the hardware, how many T&L units the GPU has, etc.
A shader pipeline has a quite limited amount of instructions (which also highly depends on the hardware and shader model). That's why you can't necessarily unroll any sort of loop as it may result in too many instructions.
Such "single instruction" tests are pretty pointless. There are special shader analyzers for the major GPU vendors out there. Though as i said knowing the "cost" of a single instruction doesn't help much.
You sound like there's no difference between light and heavy shaders.
High parallelism don't make problem if you understand architecture, and you can isolate load.
It worked quite well with texture random read when i tested bandwidth between L1 and L2 texture cashes. I know limits, i didn't reached them. there was no problem when i tested other instructions.
Thanks for mentioning shader analyzers, i didn't know they are exist. but they won't help me much, cause I need to test a lot of different mobile devices, and their results vary a lot.