- Home /
Jobs perfomance slower than single thread.
Hello. I created a 3D scene with 1000 spheres randomly walking back and forth. There are ground and spheres nothing else. There 2 versions of walking scripts. One that uses transform and the other uses transform with jobs. I assumed that jobs would obviously be faster but I did something wrong and I need help to figure out mistake. For now version without jobs perform far better than the one with jobs. On screenshots I use without Burst compiler. It works, it improves ms and fps, but single thread is still faster.
SingleThread Code
void Update()
{
if (!Coroutine_delay) {
StartCoroutine(NewDestination());
}
if (1f < Vector3.Distance(myTransform.position, destionationPoint3D)) {
transform.position = Vector3.MoveTowards(myTransform.position, destionationPoint3D, speed * Time.deltaTime);
}
}
IEnumerator NewDestination()
{
Coroutine_delay = true;
lowerLeftAngle = new Vector3(myTransform.position.x +20, myTransform.position.y, myTransform.position.z +20);
upperRightAngle = new Vector3(myTransform.position.x -20, myTransform.position.y, myTransform.position.z -20);
destionationPoint3D = new Vector3(UnityEngine.Random.Range(lowerLeftAngle.x, upperRightAngle.x), myTransform.position.y, UnityEngine.Random.Range(lowerLeftAngle.z, upperRightAngle.z));
yield return delayC;
Coroutine_delay = false;
}
Code with Jobs
public class Spawner : MonoBehaviour
{
[SerializeField]GameObject unit;
public int amount;
public bool jobber;
GameObject go;
private List<GameObject> listok=new List<GameObject>();
void Start()
{
for (int i = 0; i < amount; i++)
{
go=Instantiate(unit, new Vector3(UnityEngine.Random.Range(-100f, 100f), unit.transform.position.y, UnityEngine.Random.Range(-100f, 100f)),Quaternion.identity);
listok.Add(go);
//Debug.Log(listok[0]);
}
}
void Update()
{
if (jobber)
{
NativeArray<float3> startPoint = new NativeArray<float3>(listok.Count, Allocator.TempJob);
NativeArray<float3> destinationPoint = new NativeArray<float3>(listok.Count, Allocator.TempJob);
NativeArray<float> speed = new NativeArray<float>(listok.Count, Allocator.TempJob);
NativeArray<float3> endPoint = new NativeArray<float3>(listok.Count, Allocator.TempJob);
TransformAccessArray transformAccessArray = new TransformAccessArray(listok.Count);
for (int i = 0; i < listok.Count; i++)
{
startPoint[i] = listok[i].transform.position;
destinationPoint[i] = listok[i].gameObject.GetComponent<CircleUnit>().destionationPoint3D;
speed[i] = listok[i].gameObject.GetComponent<CircleUnit>().speed;
transformAccessArray.Add(listok[i].transform);
}
UnitMoveJobParallel unitMoveJobParallel = new UnitMoveJobParallel
{
deltaTime = Time.deltaTime,
speed = speed,
startPoint = startPoint,
destinationPoint = destinationPoint,
endPoint = endPoint
};
JobHandle jobHandle = unitMoveJobParallel.Schedule(transformAccessArray);
jobHandle.Complete();
for (int i = 0; i < listok.Count; i++)
{
listok[i].transform.position = endPoint[i];
}
startPoint.Dispose();
destinationPoint.Dispose();
speed.Dispose();
endPoint.Dispose();
transformAccessArray.Dispose();
}
}
}
[BurstCompile]
public struct UnitMoveJobParallel : IJobParallelForTransform
{
public NativeArray<float3> startPoint;
public NativeArray<float3> destinationPoint;
public NativeArray<float> speed;
public NativeArray<float3> endPoint;
public float deltaTime;
public void Execute(int index, TransformAccess transform)
{
endPoint[index] = Vector3.MoveTowards(startPoint[index], destinationPoint[index], speed[index] * deltaTime);
}
}
The only thing I found is that before jobs start on the timeline there is a lot of work on the main thread. Screenshot 2. I suppose it has something to do with allocating 1000 in 5 containers each frame...
So. My questions are: 1) Is there a way to precash native containers to avoid populating them each frame? Provided the number of units will stay the same. Declaring it outside update or on class level throws error. Either about not allowed to be called from mono or problem with disposing them.
2) Suppose native collections suppose to work like that. (to be created each frame) Then where I made mistakes? Why multithreaded works slower than singlethreaded?
3) Is there guide or documentation to get educated? unity documentation on jobs is a bit short.
4) On the second screenshot the tooltip says something about 10k and 17k instances on the threads, my scene is only of ~1000 gameobjects. What are those? Is there a clear way to inspect those?
Any help is greatly appreciated
UPDATE: I removed GetComponent from UPDATE Method. But that still doesn't solve the problem
Answer by kidi0892 · Mar 03 at 04:05 AM
I found the solution. Preallocate arrays with Persistent allocation attribute. Turns out transofmerArray don't need to be updated manually. It is updated automatically. So in the for loop in the Update method I left filling only destinationPointArray. It can be optimised with events but it is unnecessary because at 25k objects script execution is low-cost compared to rendering.
After removing allocations in Update() it finally started to work as expected. Jobs script version perform slightly faster. I wonder jobs would work with inconsistent amount of gameobjects. You'd have to create new NativeArray each time amount has changed which would nullify all benefits of jobs. Or I again missing something. I'll figure it out... somehow
Turns out transofmerArray don't need to be updated manually. It is updated automatically.
It needs to be updated, just not every frame. It's not updated automatically, just not cleared without an explicit call.
After removing allocations in Update() it finally started to work as expected.
Let me tell you that allocation call itself is not the issue here. Allocations are O(1)
and rarely noticeable in the profiler graphs - the real issue here is caused how you fill those allocations with data - that is the slow part and scales O(n)
poorly
Other than that, here is a costly mistake that is also easy to correct. You doing this on the main thread:
for (int i = 0; i < listok.Count; i++)
listok[i].transform.position = endPoint[i];
where it should be done inside this IJobParallelForTransform
job:
//public NativeArray<float3> startPoint;// useless!
public NativeArray<float3> destinationPoint;
public NativeArray<float> speed;
// public NativeArray<float3> endPoint;// useless!
public float deltaTime;
// it's called "transform" for a reason
public void Execute ( int index , TransformAccess transform )
{
// endPoint[index] = Vector3.MoveTowards(startPoint[index], destinationPoint[index], speed[index] * deltaTime);// no
transform.position = Vector3.MoveTowards( transform.position , destinationPoint[index] , speed[index] * deltaTime );
}
Yes, I figired that was ineffective as well. My refreshed Update():
void Update()
{
if (jobber)
{
for (int i = 0; i < listok.Count; i++)
{
destinationPoint[i] = circleUnit[i].destionationPoint3D;
}
UnitMoveJobParallel unitMoveJobParallel = new UnitMoveJobParallel()
{
endPoint = endPoint,
destinationPoint = destinationPoint,
deltaTime = Time.deltaTime,
speed = speed
};
jobMoveHandle = unitMoveJobParallel.Schedule(transformAccessArray);
}
Now I need to find better way to update objects info like destination point etc and how to manage variable amount of gameobjects. Your code helps in that way but for now I want to stick to Jobs+Burst without using entities. Thanks again for helping out.
(...) I want to stick to Jobs+Burst without using entities.
I don't follow, my code uses no entities but gameObjects that store their data in NativeList
s.
Answer by andrew-lukasik · Mar 05 at 01:26 AM
You have 2.6 [ms]
from vanilla code @ unspecified cpu
And that job-ified code contains many mistakes, some very costly ( TransformAccess transform
totally ignored).
I've got 0.3 [ms]
total, spread across worker threads @ i3-4170 ( 0.1 [ms]
to complete)
MyUnitComponent.cs
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Jobs;
using Unity.Mathematics;
using Unity.Collections;
using Unity.Jobs;
using Unity.Entities;
using Unity.Rendering;
using BurstCompile = Unity.Burst.BurstCompileAttribute;
public class MyUnitComponent : MonoBehaviour
{
public static List<MyUnitComponent> Instances = new List<MyUnitComponent>();
public static TransformAccessArray Transforms;
public static NativeList<float> Speed;
public static NativeList<float3> Destionation;
public static JobHandle Dependency;
public int Index { get; set; } = -1;
[SerializeField] float _speed = 10f;
[SerializeField] float3 _testDestination = new float3( 100 , 100 , 100 );// delete line later on
public const int k_max_instances = 10000;
void OnEnable ()
{
if( !Transforms.isCreated )
{
Transforms = new TransformAccessArray( k_max_instances );
Speed = new NativeList<float>( Allocator.Persistent );
Destionation = new NativeList<float3>( Allocator.Persistent );
}
Dependency.Complete();// immediate data access
Index = Instances.Count;
Instances.Add( this );
Speed.Add( _speed );
Destionation.Add( _testDestination );// delete line later on
// Destionation.Add( transform.position );// uncomment later on
Transforms.Add( transform );
}
void OnDisable ()
{
Dependency.Complete();// immediate data access
Instances.RemoveAtSwapBack( Index );
Speed.RemoveAtSwapBack( Index );
Destionation.RemoveAtSwapBack( Index );
Transforms.RemoveAtSwapBack( Index );
if( Instances.Count!=0 )
{
// fix Index after RemoveAtSwapBack:
if( Index>=0 && Index<Instances.Count )
Instances[ Index ].Index = Index;
}
else
{
if( Speed.IsCreated ) Speed.Dispose();
if( Destionation.IsCreated ) Destionation.Dispose();
if( Transforms.isCreated ) Transforms.Dispose();
}
}
#if UNITY_EDITOR
void OnValidate ()
{
if( Index!=-1 )
{
Dependency.Complete();// immediate data access
Speed[Index] = _speed;
Destionation[Index] = _testDestination;
}
}
void OnDrawGizmosSelected ()
{
if( Index!=-1 )
{
Gizmos.color = Color.yellow;
var pos = transform.position;
Gizmos.DrawLine( pos , Destionation[Index] );
Gizmos.DrawSphere( pos , 0.1f );
}
}
#endif
public void SetSpeed ( float value ) => Speed[Index] = value;
public void SetDestination ( float3 value ) => Destionation[Index] = value;
}
public class MyUnitMoveSystem : SystemBase
{
protected override void OnUpdate ()
{
var unitMoveJob = new UnitMoveJob
{
Destination = MyUnitComponent.Destionation.AsArray() ,
Speed = MyUnitComponent.Speed.AsArray() ,
DeltaTime = Time.DeltaTime ,
};
Dependency = unitMoveJob.Schedule( MyUnitComponent.Transforms , Dependency );
MyUnitComponent.Dependency = Dependency;
}
[BurstCompile] public struct UnitMoveJob : IJobParallelForTransform
{
[ReadOnly] public NativeSlice<float3> Destination;
[ReadOnly] public NativeSlice<float> Speed;
public float DeltaTime;
void IJobParallelForTransform.Execute ( int index , TransformAccess transform )
{
transform.position = MoveTowards( transform.position , Destination[index] , Speed[index] * DeltaTime );
}
// Rewritten for Burst, src: https://github.com/Unity-Technologies/UnityCsReference/blob/master/Runtime/Export/Math/Vector3.cs#L59-L77
float3 MoveTowards ( float3 src , float3 dst , float maxDistanceDelta )
{
float3 dir = dst - src;
float distSq = math.lengthsq(dir);
if( distSq==0 || ( maxDistanceDelta>=0 && distSq<=maxDistanceDelta*maxDistanceDelta ) ) return dst;
return src + dir / math.sqrt(distSq) * maxDistanceDelta;
}
}
}
MyUnitSpawner.cs
using UnityEngine;
public class MyUnitSpawner : MonoBehaviour
{
[SerializeField] GameObject _prefab = null;
[SerializeField][Range(0,MyUnitComponent.k_max_instances)] int _amount = 100;
void OnEnable ()
{
float y = _prefab.transform.position.y;
for( int i=0 ; i<_amount ; i++ )
{
Vector3 pos = new Vector3( Random.Range(-100f,100f) , y , Random.Range(-100f,100f) );
Instantiate( _prefab , pos , Quaternion.identity );
}
}
}