- Home /
Instance of Vector3.zero faster than Vector3.zero?
I'm currently running an optimization audit on my codebase for my game and noticed Vector3.zero coming up in the profiler. It never really dawned on me that call to that value would have overhead in terms of a constructor and essentially what looks to be a getter in the Deep Profile information.
Here's my test. I wanted to share it with you guys and get your feedback. It's not much at first glance, but on my computer it was about 1 second faster to access an instance variable in the class for Vector3.zeo (vZero in the code below) as opposed to Vector3.zero when testing over 5000 iterations of each test against one another, profiler off. It was actually 5 seconds faster with the profiler on, but that's just miscellaneous information. Even at 1000 iterations, the result was half a second better for the instance.
Just imagine how many places you might end up substituting instances for these convenient, but seemingly not-as-performant vector "getters" out in your code!
using UnityEngine;
using System.Collections;
public class VectorBenchmarks : MonoBehaviour {
Vector3 vZero = Vector3.zero;
Vector3 targetVector;
public int testCount = 5000;
public int currentTest = 0;
// Use this for initialization
void Start () {
StartCoroutine("VectorTest");
}
IEnumerator VectorTest()
{
Debug.Log("Beginning test one, Vector3.zero");
float startTime;
float endTime;
startTime = Time.time;
while(currentTest < testCount ) {
targetVector = Vector3.zero;
++currentTest;
yield return null;
}
endTime = Time.time;
Debug.Log("End Test One: startTime:" + startTime + "endTime:" + endTime + "duration of test:" + (endTime-startTime));
currentTest = 0;
Debug.Log("Beginning test two, instance variable definition of Vector3.zero");
startTime = Time.time;
while(currentTest < testCount ) {
targetVector = vZero;
++currentTest;
yield return null;
}
endTime = Time.time;
Debug.Log("End Test Two: startTime:" + startTime + "endTime:" + endTime + "duration of test:" + (endTime-startTime));
}
}
The "yield return null"s in there are quite bad for benchmarking, because then you're measuring the time it takes to draw the frame. Remove all coroutines and yields and everything of that nature when doing benchmarking, so you can concentrate on only measuring the thing you're actually interested in. Doing only one operation per frame will get totally lost in the noise; you should do some millions of iterations at once (at the least) to get any kind of measurable result.
Answer by Steel_Arm · Jun 25, 2013 at 08:16 PM
Here is my attempt at this:
using UnityEngine;
using System.Collections;
public class VectorProfile : MonoBehaviour {
float t1, t2;
// Use this for initialization
void Start () {
Vector3 v;
Vector3 vz = Vector3.zero;
t1 = Time.realtimeSinceStartup;
for (int i = 0; i < 100000000; i++) {
v = Vector3.zero;
}
Debug.Log("T1: " + (Time.realtimeSinceStartup - t1).ToString());
t2 = Time.realtimeSinceStartup;
for (int i = 0; i < 100000000; i++) {
v = vz;
}
Debug.Log("T2: " + (Time.realtimeSinceStartup - t2).ToString());
}
// Update is called once per frame
void Update () {
}
}
Here are my results of doing 100,000,000 iterations of each.
T1: 2.046641 UnityEngine.Debug:Log(Object) VectorProfile:Start() (at Assets/Core Assets/Scripts/VectorProfile.cs:14)
T2: 0.5151472 UnityEngine.Debug:Log(Object) VectorProfile:Start() (at Assets/Core Assets/Scripts/VectorProfile.cs:19)
T1, the first test, uses Vector3.zero. T2 uses an instance of Vector3.zero called vz.
I just ran the same thing.
T1: 1.36627 T2: 0.3786879
That's really interesting.
@Loius: That would require some insane stupidity on the Vector3
author's part; the more natural way would be
public static readonly Vector3 zero = new Vector3(0,0,0);
Which should be just as fast as or faster than a local variable. $$anonymous$$ost likely, this test just shows (another) awful optimization-fail on the part of the $$anonymous$$ono JIT.
@Immanuel Oh, I didn't realize Vector3
is a struct
(I am new to Unity, though not to C#). In that case, I'd expect all pieces of code to run at the same speed, and in fact that is exactly what I see.
Haha, I had my comment up for like ten seconds you speed demon. :)
I said essentially "I don't doubt that .zero is implemented as something like { get { return new Vector3(0,0,0); } }"
It would certainly make sense to do static readonly. I've implemented an integer-only Vector3 and it is so many headaches due to Unity refusing to serialize structs, so I have to make it a class and I've completely lost all touch with how the real V3 works by now. e_e
Answer by Immanuel-Scholz · Jun 25, 2013 at 09:03 PM
Your performance test is seriously flawed. Here some couple of things that spring up my mind:
Use the right tool for the job. Time.time is not meant for precision timing. Use System.Diagnostics.Stopwatch.
Always measure in deployed programs, not in the editor or in development builds.
Try to minimize the "code under measurement". Especially stay away from heavy noise like rendering frames or starting coroutines!
If possible, test the "empty case" as a control group. This measures your test case overhead. If you do the previous point right, this will spill out "0". Also, you will be suprised how often you get an "WTF??" - moment, when the empty case turns out to be slower than the normal case. :-D (If that happens, read up on "JIT compiler in C#")
Sure, usually you skip some or most of these points. If you are sure what you are doing and the performance differences are rather drastical, there is no need for going high-precision.
But in your case, you want to measure 1 second in 5000 iterations, which is 0.2 micro seconds. That is way too short to use a crude Axe-technique.
In the end, Vector3.zero may still be 100 times slower than accessing a non-volatile local member copy (my money is on either "cache faults" or "bad mono 2.6 JIT compiler"), but I really doubt it takes your measured 200µs for every access.
Here is what I did:
Vector3 vZero = Vector3.zero;
void Start() {
var watch = new System.Diagnostics.Stopwatch();
Vector3 v;
watch.Reset(); watch.Start();
for (int i = 0; i < 10000000; ++i)
{ v = Vector3.zero; }
watch.Stop();
result += "property: " + watch.ElapsedMilliseconds + " ms\n";
watch.Reset(); watch.Start();
for (int i = 0; i < 10000000; ++i)
{ v = vZero; }
watch.Stop();
result += "local: " + watch.ElapsedMilliseconds + " ms\n";
watch.Reset(); watch.Start();
for (int i = 0; i < 10000000; ++i)
{ }
watch.Stop();
result += "empty: " + watch.ElapsedMilliseconds + " ms\n";
}
string result = "";
void OnGUI()
{
GUI.Label(new Rect(20, 20, 500, 100), result);
}
For my machine, it spits out
property: 120 ms
local: 21 ms
empty: 10 ms
So for my measurement, the difference is roughly 100ms for 10 million iterations, or in other words: 1 nano second per access.
This benchmark is still flawed, on two points.
First of all, there is a good chance the optimizer can optimize away the local variable assignment for the second two cases, but not the first (since it is calling a function) - assigning to something like myArray[i]
in the loop would be better.
Secondly, you can't test them all in one test, since the JIT could kick in mid-test - you need to test them all in separate executions.
When I do this I find that the local and property tests run at the same speed, on the .Net CLR at least (I don't have $$anonymous$$ono installed AT$$anonymous$$. I do however see the same results as you in the .Net CLR for the flawed test).
You definitely can't go by .NET results, since it has differences and some optimizations compared to $$anonymous$$ono, especially the version currently used in Unity.