GPU Instancing works strangely on android
Hi,
I'm currently trying to render planets with the help of a custom shader (2D). For this I usually have to render >10000 tiles per planet. After rendering them once, I read them into a texture and save them as a file.
To draw so many tiles, I use GPU instancing to draw 511 tiles per call.
When testing this on PC, it works fine. Every texture looks normal, regardless of how many planets get generated. Heres an example:
On android however (Samsung S10), the first 2-3 planets work fine before the rendering process seems to break down. It looks like this:
Besides wrong textures being used, there are vertices missing sometimes (some tiles get drawn as triangles, better seen in high-res images, cant upload because too big) and shadows/atmospheres dont get their proper colors. Also, the trees and grass have no issues, however this is probably because they get drawn with Graphics.DrawMesh and are not instanced.
I have three shaders which get used here, they all look approximately like this:
Tags { "Queue"="Transparent" "RenderType"="Transparent" }
Cull Off
Blend SrcAlpha OneMinusSrcAlpha
Pass
{
CGPROGRAM
#pragma vertex vert
#pragma fragment frag
#pragma multi_compile_instancing
#pragma target 3.5
#include "UnityCG.cginc"
struct appdata
{
float4 vertex : POSITION;
float2 uv : TEXCOORD0;
UNITY_VERTEX_INPUT_INSTANCE_ID
};
struct v2f
{
float4 vertex : SV_POSITION;
float2 uv : TEXCOORD0;
UNITY_VERTEX_INPUT_INSTANCE_ID
};
// define Props in instancing buffer to enable use of MaterialPropertyBlock for per-instance data
UNITY_INSTANCING_BUFFER_START(Props)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
UNITY_INSTANCING_BUFFER_END(Props)
// global variables, set with "Shader.SetGlobalFloat("_MapWidth", X); before drawing
float _MapWidth;
float _MapHeight;
sampler2D _MainTex;
float4 _MainTex_ST;
v2f vert (appdata v)
{
v2f o;
UNITY_SETUP_INSTANCE_ID(v);
/* not using "UNITY_TRANSFER_INSTANCE_ID(v, o);" here because it
leads to only the first instance being able to access the
MaterialPropertyBlock for some reason
*/
/* vertex shenanigans here, Props is not accessed here */
o.vertex = mul(UNITY_MATRIX_VP, newV);
o.uv = TRANSFORM_TEX(v.uv, _MainTex);
return o;
}
fixed4 frag (v2f i) : SV_Target
{
UNITY_SETUP_INSTANCE_ID(i);
// in the atmosphere shader I do this for example:
return UNITY_ACCESS_INSTANCED_PROP(Props, _Color);
// in the shader that draws the textures I do this:
fixed4 c = UNITY_SAMPLE_TEX2DARRAY(
_Textures,
float3(i.uv, UNITY_ACCESS_INSTANCED_PROP(Props, _Info)[0]));
return c;
}
ENDCG
}
I use a Texture2DArray to store the textures in the shader, I don't think this is the cause though because when I set the texture indices manually, that are usually obtained by accessing Props, they work normally, for instance:
fixed4 c = UNITY_SAMPLE_TEX2DARRAY( _Textures, float3(i.uv, 2.0));
always leads to the texture with the 3rd index being used by the shader. This leads me to conclude that there is something going on with the MaterialPropertyBlock that I use to pass per-instance-data. After some amout of calls the gpu seems to get confused, forgetting to assign values to the MaterialPropertyBlock and randomly leaving out vertices, but then in other Draw calls it works normally.
The drawing code roughly looks like this:
private static Matrix4x4[] CollectMatrices()
{
Matrix4x4[] matrices = new Matrix4x4[tileCount];
Vector3 scale = new Vector3(1, 1, 1);
Vector3 pos = new Vector3();
//loops through all tiles and creates their matrix, used in Graphics.DrawMeshInstanced
for (int i = 0; i < tileCount; i++)
{
Matrix4x4 newMatrix = Matrix4x4.identity;
pos.x = x;
pos.y = y;
newMatrix.SetTRS(pos, Quaternion.identity, scale);
matrices[i] = newMatrix;
}
return matrices;
}
private static Matrix4x4[] GetBatchedMatrices(int offset, int batchCount, Matrix4x4[] matrices)
{
Matrix4x4[] batchedMatrices = new Matrix4x4[batchCount];
for (int i = 0; i < batchCount; ++i)
{
batchedMatrices[i] = matrices[i + offset];
}
return batchedMatrices;
}
private static void RenderWthInstancing()
{
Matrix4x4[] matrices = CollectMatrices();
Shader.SetGlobalFloat("_MapWidth", TileMap.width);
Shader.SetGlobalFloat("_MapHeight", TileMap.height);
int total = matrices.Count;
// get the amount of necessary draw calls, BATCH_MAX = 511 and BATCH_MAX_FLOAT = 511f
int batches = Mathf.CeilToInt(total / BATCH_MAX_FLOAT);
for (int i = 0; i < batches; ++i)
{
int batchCount = Mathf.Min(BATCH_MAX, total - (BATCH_MAX * i));
int start = Mathf.Max(0, i * BATCH_MAX);
Matrix4x4[] batchedMatrices = GetBatchedMatrices(start, batchCount, matrices);
// mpb is external static MaterialPropertyBlock variable
// mpb gets cleared every drawcall
mpb.Clear();
mpb.SetVectorArray("_Info", new Vector4[] { new Vector4(tile.index, 0, 0, 0) }); // this is simplified, every tile that is in this call has the same texture index
Graphics.DrawMeshInstanced(MeshFilter.sharedMesh,
0,
MeshMaterial, // external static Material variable
batchedMatrices,
batchCount,
mpb,
ShadowCastingMode.On,
true,
LayerMask.NameToLayer("Planet Render")); // only draw to "planet render" layer
// procedure is only slightly different for drawing atmosphere or shadows
}
}
If you have any questions about details, have any advice or detailed knowledge about rendering on android or have a suggestion on how to fix this issue or how to better render 10000 tiles, perhaps with an entirely different method (doesnt have to be THAT fast, just has to work), please go ahead.
Thanks in advance!