C# Is either method faster? (Less expensive) Variables

Question

Hi I was just wondering while working on a C# script if one of two methods for variable retrieval is faster than the other. When trying to get the average color from a texture I have two slightly different ways of going about it, mainly just either storing a variable or retrieving it.

This is what I am doing. Example 1:

  Color[] Colors = texture.GetPixels();
  int pixelCount = Colors.Length;
  int r =0; int g = 0; int b = 0; int a = 0;
  for (int i = 0; i < pixelCount; i++)
  {
       Color c = Colors[i];
       r += c.r;
       g += c.g;
       b += c.b;
       if (includeAlpha)
            a += c.a;
  }
  if (a != 0)
       a /= pixelCount;
  return new Color (r/pixelCount, g/pixelCount, b/pixelCount, a);

Example 2:

  Color[] Colors = texture.GetPixels();
  int pixelCount = Colors.Length;
  int r =0; int g = 0; int b = 0; int a = 0;
  for (int i = 0; i < pixelCount; i++)
  {
       r += Colors[i].r;
       g += Colors[i].g;
       b += Colors[i].b;
       if (includeAlpha)
            a += Colors[i].a;
  }
  if (a != 0)
       a /= pixelCount;
  return new Color (r/pixelCount, g/pixelCount, b/pixelCount, a);

So there you can see the only change is in the looping 'for' statement. By storing the color in a variable or accessing it for every integer value, is one a better option than the other? If the reasoning could also be explained I would appreciate it.

Answer 1

Best Answer

Answer by ByteSheep · Sep 08, 2016 at 02:55 PM

This is probably a micro optimization and most likely won't effect your games performance, however, the only way to find out is by profiling! You can imagine example 1 looking like this, where there is only one allocation for the color variable:

 Color[] Colors = texture.GetPixels();
 int pixelCount = Colors.Length;
 int r =0, g = 0, b = 0, a = 0;
 Color c = new Color();                   // now initialized outside the loop
 for (int i = 0; i < pixelCount; i++)
 {
      c = Colors[i];
      r += c.r;
      g += c.g;
      b += c.b;
      if (includeAlpha)
           a += c.a;
 }
 if (a != 0)
      a /= pixelCount;
 return new Color (r/pixelCount, g/pixelCount, b/pixelCount, a);

Edit: Note that it doesn't make much of a difference whether you declare a variable inside or outside a for loop (see http://stackoverflow.com/questions/7383016/reference-type-variable-recycling-is-a-new-reference-variable-created-every-lo)
So in practice it shouldn't matter, but if you're not sure which to go with, I personally prefer creating a variable for any array element that will be used multiple times (example 1). Really this shouldn't concern you until the profiler tells you that this particular piece of code needs to be optimized (if there is even any difference between the two examples).

Add comment · Show 7 · Share

230000751 · Sep 08, 2016 at 03:06 PM 0

Share

Hey thanks for that. I didn't think there would be much of a difference but you never know, right. Anyway I like your idea of creating a storage variable outside of the loop so I think I will adopt that into my own scripts.

ByteSheep 230000751 · Sep 08, 2016 at 03:18 PM 0

Share

For readability, I would prefer your version of example 1 since it is simpler to understand.
When it comes to this sort of optimization the main thing that matters is code readability and here's a good explanation of why it makes little difference whether or not you declare a variable inside of the loop: http://stackoverflow.com/questions/7383016/reference-type-variable-recycling-is-a-new-reference-variable-created-every-lo
Happy coding :)

Bunny83 · Sep 08, 2016 at 06:01 PM 1

Share

You really shouldn't declare local variables in a scope that isn't necessary. All local variables (no matter where they are defined in the code) are allocated when you enter the method. local variables live on the stack.

So declaring "c" outside of the for loop actually prevents compiler optimisations. If you declare a local variable at the beginning of a method it must keep it's value until the end. If you declare it inside a sub scope (like inside a for loop) the compiler could reuse the space of the local variable for other local variables that are declared in another sub scope later. That's why for loop variables should also be declared locally inside the for loop statement.

Apart from all those pseudo optimisations which most likely won't change anything (the bottleneck of this whole code is most likely "GetPixels"), the whole algorithm won't work because r,g,b and a are declared as int variables. "Color" represents an RGBA color with float values in the range 0.0f to 1.0f. So this would throw an exception because it can't implicitly convert float to int.

This will perform equally and acually works:

 Color[] colors = texture.GetPixels();
 int pixelCount = colors.Length;
 Color sum = new Color(0,0,0,0);
 for (int i = 0; i < pixelCount; i++)
      sum += colors[i];
 if (!includeAlpha)
     sum.a = 0f;
 return sum / pixelCount;

A boolean check each iteration can be far worse than an additional floating point addition. floating point arithmetic is now ridiculous fast. Conditionals inside a long loop can have a negative impact on caching.

ByteSheep Bunny83 · Sep 08, 2016 at 06:41 PM 0

Share

Just to clarify; the code example above wasn't an optimisation suggestion, rather it was meant to point out that the variable declared in the for loop of example 1 wouldn't allocate memory each time it looped (perhaps in a confusing manner though). I definitely agree that the variable should be declared within the loop for reasons outlined in the stackoverflow discussion I linked to, or better yet, just entirely removed like in your example :)
While we're suggesting code snippets why not simply:

 Color[] colors = texture.GetPixels();
 return colors.Aggregate((c1, c2) => c1 + c2) / colors.Length;

elenzil Bunny83 · Sep 08, 2016 at 10:07 PM 0

Share

heya @Bunny83 - fwiw i tested this approach of sum$$anonymous$$g Color structs directly, ie, "sum += colors[i]", and it turned out to be significantly slower than accumulating the individual color components yourself. see my comment a bit further down in the thread, "Hold the phone".

ByteSheep elenzil · Sep 08, 2016 at 10:24 PM 0

Share

http://c2.com/cgi/wiki?PrematureOptimization
Interesting to see your results on optimizing this piece of code, but I'd argue that there's no need to spend time doing so until it is actually identified as something that's worth optimizing. The answer to the OPs question should probably be - "There's no real difference between the two. Use the profiler to see what parts of your application could be improved. Don't worry about micro optimizations until you have identified slow code and carry on developing your game."

Show more comments

Answer 2

Answer by doublemax · Sep 08, 2016 at 03:22 PM

I'm almost certain that after compiler optimization you end up with the same code. If the compiler is stupid, use the first version. But if you're concerned about performance, you definitely need two different loops, one with and one without alpha. Conditional branches are performance killers and that "if (includeAlpha)" for each pixel iteration makes my toenails curl.

Add comment · Show 3 · Share

maxxa05 · Sep 08, 2016 at 08:43 PM 0

Share

It may not be the case for Unity's mono runtime, but I know that a CPU has a branch predictor that will try to guess condition results to pipeline instructions correctly. So, in this case, it might guess the result of the if based on previous results. Since includeAlpha never changes in the loop, it would guess the result correctly every time (except if it changes in another thread, but I doubt it).

elenzil maxxa05 · Sep 08, 2016 at 08:49 PM 0

Share

true, but this is such a simple optimization you may as well make it and not make assumptions about the cleverness of the CPU. some branch predictors are just "it will probably be false".

230000751 · Sep 10, 2016 at 02:16 PM 0

Share

Hmm, considering the chance that a conditional statement may in fact weigh it down unnecessarily, even if Unity is smart enough to make up for it, I think I will include alternate functions with and without alpha. Thanks for the suggestion.

Answer 3

Answer by Garazbolg · Sep 08, 2016 at 05:08 PM

Here is what i'd do :

 float pixelCount = (float)(texture.height*texture.width); //for byte2float in the return statement
 int r = 0, g = 0 ,b = 0, a = 0;
 
 if(includeAlpha)    //Instead of comparing every frame compare just once
 {
     foreach(Color32 c in texture.GetPixels32()) //To prevent doing pointer arithmetic for every []
     {
         r += c.r;
         g += c.g;
         b += c.b;
         a += c.a;
     }
 }
 else
 {
     foreach(Color32 c in texture.GetPixels32()) //To prevent doing pointer arithmetic for every []
     {
         r += c.r;
         g += c.g;
         b += c.b;
     }
 }
 
 return new Color(r/pixelCount, g/pixelCount, b/pixelcount, includeAlpha ? a/pixelCount : 0);

Plus like this you don't store a copy of the Array, freeing some memory usage.

Also be carefull because Color uses float values from 0 to 1 whereas Color32 uses byte values from 0 to 255. And considering you used integers to store the sums you wouldn't have an accurate result.

Disclaimer : I'm more of a C++ Dev so c# compiler behaviour are a bit foreign to me, but like doublemax said, you'd most likely end up with the same code doing all sorts of optimizations.

Add comment · Show 3 · Share

doublemax · Sep 08, 2016 at 05:21 PM 1

Share

Foreach is potentially dangerous. Depending on the IEnumerator implementation it could create temporary objects that the garbage collector has to deal with later. Unless i know i'm dealing with a class where sequential access is faster than random access, e.g. a linked list, i would avoid it.

elenzil doublemax · Sep 08, 2016 at 07:03 PM 0

Share

on doublemax's comment. i would definitely about foreach here.

230000751 · Sep 11, 2016 at 12:16 AM 0

Share

Thanks for the feedback, and yeah I got the error of converting a float to int before I realized that's exactly what I did. Deciding if alpha is used before the looping statement is a better idea than deciding on every loop so thanks for that clarification.

Answer 4

Answer by elenzil · Sep 08, 2016 at 10:01 PM

hold the phone.

it turns out to be pretty important to consider whether you're using GetPixels() or GetPixels32(). The former returns Color structs, which have each component as a float. The latter returns Color32 structs, where each component is a 8-bit byte.

In my tests, the act of calling GetPixels/32() itself is non-negligible. For a texture with 11561940 texels (3645 x 3172):

GetPixels() (floats) between 0.1 and 0.17 seconds.
GetPixels32() between 0.04 and 0.05 seconds.

so, GetPixels32() significantly faster than GetPixels().

then for doing the arithmetic, i tried three methods:

a. the inner loop used Color's += operator. This was the slowest.

b. the inner loop adds floating-point components to floating-point accumulators. Second fastest.

c. the inner loop adds 8-bit byte components to 32-bit int accumulators. Fastest.

a. 0.038 micro-seconds per pixel for the accumulation portion.

b. 0.016 micro-seconds per pixel for the accumulation portion.

c. 0.013 micro-seconds per pixel for the accumulation portion.

i also looked a bit into using Mono.simd to do the adds in parallel, but got tired. That's probably the best approach tho. The challenge will be casting the array of Color structs into an array of Vector4f structs.

so, here's the winning code from my tests. i've left in the timing diagnostics.

   void doItAsBytes() {
     string s = "as bytes   : ";
 
     Texture2D t = theImage.sprite.texture;
 
     float t0 = Time.realtimeSinceStartup;
     Color32[] cs = t.GetPixels32();
 
     int r = 0;
     int g = 0;
     int b = 0;
     int a = 0;
 
     float t1 = Time.realtimeSinceStartup;
 
     for (int n = cs.Length - 1; n >= 0; --n) {
       Color32 c = cs[n];
       r += c.r;
       g += c.g;
       b += c.b;
       a += c.a;
     }
 
     float count = cs.Length;
     Color32 avg = new Color32((byte)(r / count), (byte)(g / count), (byte)(b / count), (byte)(a / count));
 
     float t2 = Time.realtimeSinceStartup;
 
     float dtFetch = t1 - t0;
     float dtSum = t2 - t1;
     float microSPP = dtSum * oneM / count;
 
     theButton.targetGraphic.color = avg;
 
     Debug.Log(s + "fetched " + cs.Length + " texels in " + dtFetch + " seconds. ");
     Debug.Log(s + "averaged " + cs.Length + " texels in " + dtSum + " seconds. " + microSPP.ToString("0.000") + "µS per pixel");
     Debug.Log(s + "total µS per pixel " + ((dtFetch + dtSum) / count * oneM).ToString("0.000"));
     if (verbose) {
       Debug.Log(s + "avg = " + avg.ToString());
     }
   }

Add comment · Show 3 · Share

elenzil · Sep 08, 2016 at 10:04 PM 0

Share

oh, i also tried method C while ignoring alpha. this shaved off 0.002 microseconds per pixel, which in my 3645 x 3172 image resulted in an overall savings of about 0.03 seconds.

Bunny83 · Sep 09, 2016 at 08:04 AM 0

Share

This analysis is pretty useless because:

Time.realtimeSinceStartup is way to unprecise to measure time critical things.
If your thread isn't pinned to a core the results are pretty random as your thread could be moved between cores.
Your thread can experience a context switch in between the execution regardless of pinning. We're in a multithreaded environment.
Further more the code using GetPixels32 could fail on 4k or larger textures since an int can only hold numbers up to "2,147,483,647". The worst case (all pixels white) is hit at about "8,388,607" pixels which is about "2896 x 2896" pixels. At this point you can receive an integer overflow.

To get a reliable average value of a whole image you might consider creating mipmaps of your image. So each pixel of a mipmap level corresponds to the average of 4 pixels in the level above.

As i said above the bottleneck will be GetPixels, no matter if you use GetPixels or GetPixels32. You create garbage each call (which you also don't take into account in your analysis). Those things a far worse than what you can ever improve with such micro optimisations.

Yes GetPixels32 was added to provide a less memory demanding way to access pixel data. Color32 requires only 1/4 of the memory a Color needs. However it has other limitations. You could change your code to use "long" ins$$anonymous$$d of "int" to boost the supported range. However long (Int64) performs not as well as int.

Using GetPixels(32) for a live video stream wouldn't work. It's way too slow. For things like that you need native plugins which can access native buffers.

elenzil Bunny83 · Sep 09, 2016 at 11:17 PM 0

Share

hi Bunny.

thanks for your details and thoughtful comments.

i'll field them one at a time.

i agree Time.realtimeSinceStartup is not precise enough to measure individual timespans less than ten milliseconds or so. but as you can see from the code, i'm measuring the time to operate on 11,561,940 items, and averaging the result. the total timespan was about 140ms. in that usage i think it's a fine timer. is there a more precise timer you would suggest ?
my results seemed to reproduce fairly reliably. i ran it several times. feel free to run it yourself, the code is there.
see 2. you seem to be suggesting that nothing can be measured. is that the case ?
good point that the accumulators will overflow for sufficiently large images. since the OP used ints, i figured ints were probably sufficient. in my specific case, i'm indeed just over the limit - log2(255 * 11561940) = 31.5 bits, so in the worst case for an image of the size i used there would indeed be overflow into the sign bit. this could be fixed by using unsigned or unsigned long. even then there will be limits. a general solution could probably be devised w/o significant overhead. something along the lines of sum$$anonymous$$g up as many pixels as are sure not to overflow your accumulators, then averaging those into a float, and repeating until everything is done.
i'm not sure how generating mip-maps would be faster. if you already have the mip-maps generated, then sure. if not, that's work.
my measurements do not agree with your assessment that GetPixels is always the bottleneck. on 11561940 pixels, i measure 31ms (milliseconds) for GetPixels32(), and 138ms to sum them up via the code above. so it's significant, certainly, but it's far from overwhel$$anonymous$$g the summation loop. if you have measurements to the contrary i'd be interested in seeing them.
Yes, i'm not taking garbage collection into account. I'm not sure how to assess or measure that. $$anonymous$$y interest here is to compare ways of computing the average. You're saying the GC overhead here is more significant than the summation calculation ? Is there a way to measure that ?
ulongs vs. ints: same as 4.
you may be right.

C# Is either method faster? (Less expensive) Variables

4 Replies

Your answer

Follow this Question

Related Questions