- Home /
[C#] How to use a Multi Threaded Job Queue for Math function
Hi, I currently make a 2D (top down)game (in c#). I want to have a endless terrain, for that I use a Chunksystem that genereates the terrain with a SimplexNoise function. My Problem is that if the game generates new terrain, it lags for a short time. To fix that I want to use multithreading. Because I have to calculate the terrain very often in a very short time, I want to use a job queue in a separate thread.
Does anyone have an idea how this can be achieved? Is there a lib or scripts I can use for this?
Answer by duck · Jul 23, 2014 at 04:57 PM
While splitting work across multiple frames using a coroutine is a great optimisation technique, it is actually possible to make full use of Mono's multithreading API within your Unity scripts. To take advantage of this however, you need to make sure that you don't use the non-thread-safe parts of Unity's API in your own worker threads.
The general pattern that I'd recommend to do this is to divide the work you need doing into chunks (your terrain example is an ideal example of this), and separate out as much of the code which doesn't need to touch Unity's API as you can. You then execute these chunks of work using the ThreadPool API. The key is to get your threads to output the results of their work into a shared container, then read and use the results on your main thread after all the hard work has been done in parallel, which you can then safely use with Unity's API.
So in your case, you'd probably want to generate all the vertex, triangle and normal arrays in your threads - because as long as you're just doing math and filling items in the array, this is thread safe.
Once all the vert/tri/normal computation is done, you'd fall back to your main thread and assign these arrays to your mesh (because the Mesh class and its functions & properties are part of the unity API and therefore most likely not thread safe).
I've tried to create as generic an example as I can, so you and others can use it as a template for any multithreaded work that can be broken into chunks. In this example, I simply populate a large 2d array of integers using a function (which is contrived to be computationally expensive just to show the performance difference).
In the start function, I first do all the work on a single thread to show how long that takes, followed by doing the work on multiple threads. Both versions are timed using the Stopwatch class and the results printed in the game view using OnGUI.
I've tried to make the code here clear enough that you should be able to follow the logic and see how it works, but if you have any questions, ask them in the comments.
To try the script, paste the entire code into a single Unity c# script named "ThreadingExample" (including the two struct definitions at the bottom).
Place on a gameobject in your scene, and hit play. It will lock the editor up for a few seconds as it performs the single-threaded example, followed by the multithreaded example. You should end up with results on screen something like the below screenshot. I have the cpu graph included on-screen to show the threading working - the "long hump" on CPU 0 is the single threaded work, followed by the short hump on all CPUs which is the same work done on multiple threads.

 using UnityEngine;
 using System.Collections;
 using System.Collections.Generic;
 using System.Diagnostics;
 using System.Threading;
 
 public class ThreadingExample : MonoBehaviour {
 
     int[,] results;            // this is the 2d array which will contain the result
     int gridSize = 512;
     
     // thread related variables
     WorkItem[] workItems;
     ManualResetEvent[] doneEvents;    // a series of flags, each indicating if a given chunk has finished
     WorkChunk[] workChunks;            // the container for a chunk of work items
     int numChunks = 16;
 
     string output;
     
     IEnumerator Start()
     {
         Output("Setting up work chunks");
         SetUpWorkChunks();
         yield return null;
 
         Stopwatch stopWatch = new Stopwatch();
 
         Output("Starting Main thread method");
         results = new int[gridSize,gridSize];
         yield return null;
         stopWatch.Start();
         DoNormalWork();
         stopWatch.Stop();
         Output ("Main thread method: "+(stopWatch.ElapsedMilliseconds)+"ms");
         yield return null;
 
         Output("Starting Multi threaded method");
         results = new int[gridSize,gridSize];
         yield return null;
         stopWatch.Reset();
         stopWatch.Start();
         DoThreadedWork();
         stopWatch.Stop();
         Output ("Multi-threaded method: "+(stopWatch.ElapsedMilliseconds)+"ms");
         yield return null;
         
     }
 
     void Output(string s)
     {
         output += s+"\n";
     }
 
     Vector2 scrollPosition;
     void OnGUI()
     {
         scrollPosition = GUILayout.BeginScrollView(scrollPosition);
         GUILayout.Label(output);
         GUILayout.EndScrollView();
     }
 
 
     void SetUpWorkChunks ()
     {
         // make list of all work items needing to be calculated
         workItems = new WorkItem[gridSize*gridSize];
         int i=0;
         for (int x=0; x<gridSize; ++x)
         {
             for (int y=0; y<gridSize; ++y)
             {
                 workItems[i] = new WorkItem(x,y);
                 i++;
             }
         }
         
         // share out work items between chunks (equal to number of threads allowed)
         workChunks = new WorkChunk[numChunks];
         doneEvents = new ManualResetEvent[numChunks];
 
         int numItemsPerChunk = workItems.Length / numChunks;
         for (int n = 0; n<workChunks.Length; ++n) {
 
             // work out which items this chunk should calculate
             int start = n * numItemsPerChunk;
             int end = n * numItemsPerChunk + (numItemsPerChunk - 1);
             if (n == workChunks.Length - 1) {
                 end = workItems.Length - 1;    
             }
 
             // copy portion of work items for this chunk
             WorkItem[] chunkWorkItems = new WorkItem[(end - start) + 1];
             System.Array.Copy (workItems, start, chunkWorkItems, 0, (end - start) + 1);
 
             // instantiate work chunk, passing the items
             workChunks[n] = new WorkChunk( chunkWorkItems, n, this );
 
             // we need a reference to each chunk's "doneEvent"
             doneEvents[n] = workChunks[n].doneEvent;
         }
         Output ("finished setting up work chunks");
     }
 
     void DoNormalWork ()
     {
         // this would be the non-threaded method of doing all work items:
         for (int n=0; n<workItems.Length; ++n)
         {
             DoWork( workItems[n] );
         }
     }
         
     void DoThreadedWork ()
     {
         // this loop tells all work chunks to do their work items simultaneously:
         for (int w = 0; w < workChunks.Length; w++) {
             doneEvents[w].Reset();
             ThreadPool.QueueUserWorkItem (workChunks[w].ThreadPoolCallback);
         }
         // Wait for all work chunks to complete their work...
         WaitHandle.WaitAll (doneEvents);
     }
 
     public void DoWork( WorkItem item )
     {
         // this is the work function which will occur in parallel on multiple threads
 
         // in this example, an abitrary function, made deliberately slow with a loop
         float result = 0;
         for (int n=0; n<10000; ++n)
         {
             result += Mathf.Sqrt( item.x + item.y + n * 0.1f );
         }
 
 
         // put result into result array
         results[item.x, item.y] = (int)result;
 
     }
 
 }
 
 public struct WorkItem
 {
     // This is the definition of a single item to be calculated.
     // In this example, it's basically just an 2d integer grid reference.
     public int x;
     public int y;
     public WorkItem (int x, int y)
     {
         this.x = x;
         this.y = y;
     }
 }
 
 struct WorkChunk
 {
     // A work chunk contains an array of work items
     public WorkItem[] workItems;
     
     public ManualResetEvent doneEvent; // a flag to signal when the work is complete
     public int num;
     ThreadingExample workOwner; // a reference to the owner (since the actual DoWork function is there)
 
     public WorkChunk (WorkItem[] workItems, int num, ThreadingExample workOwner)
     {
         this.num = num;
         this.workItems = workItems;
         this.workOwner = workOwner;
         doneEvent = new ManualResetEvent(false);
     }
 
     public void ThreadPoolCallback (System.Object o)
     {
         doneEvent.Reset();
         
         // do each work item in this chunk's work item list:
         for (int i=0; i<workItems.Length; ++i) {
             workOwner.DoWork( workItems[i] );
         }
         
         doneEvent.Set ();
     }
 }
 
 
Note, although my CPU meter in the screenshot above shows 8 CPUs, the machine only actually has 4 cores - the 8 comes from Intel's "hyperthreading" tech which I guess Unity is not making use of. I think is the reason that the ti$$anonymous$$gs show the threaded version is only 4x faster rather than 8x faster.
You're welcome, please mark the answer as accepted if it solves your question!
Answer by Wise · Jul 23, 2014 at 02:35 PM
What you can do is use Ienumerator (coroutines) and split up the workload into several frames.
Like:
 IEnumerator StreamOutWorld () {
     for (int i = 0; i < TerrainChunksToLoad.Count; i++) {
         Instantiate (TerrainChunksToLoad[i]);
         yield return null; //here it will skip to the next frame for the next "for" iteration
     }
     yield return null;
 }
This will only load minichunks one per frame, and you can split up any number of functions this way (but make sure that important stuff does not skip a frame, like the colliders so you won't fall through the ground if it hasn't loaded)
You'll see examples of this in a lot of games where the world "grows out" like Minecraft.
Thank you! but i think if i would do it in a seperate Thread it would be more efficient
Then you will have to wait for Unity Technologies to support it.
     private IEnumerator GenerateChunk()
     {
         for (int x = 0; x < chunkSize; x++)
         {
             for (int y = 0; y < chunkSize; y++)
             {
                 Vector3 pos = new Vector3(x, y);
                 float noise = worldGenerator.GetNoise(this.transform.position + pos);
                 GenerateTile(pos, noise);
                 yield return null;
             }
         }
         yield return null;
     }
hmm that works.. but it is too slow

I'm correcting myself:
Unity does support multi threading but limited to what API you can use, i.ex. pure math can be multi threaded but when you want to apply the math to an object you have to do this in the main thread.
Check Duck for a fantastic reply.
Perhaps you can combine multi-threaded workload with frame splitting to create a fast&streamed game world.
Your answer
 
 
             Follow this Question
Related Questions
Distribute terrain in zones 3 Answers
Cube World Terrain generation 2 Answers
[C#] Wondering what is amiss with my 1D Perlin Noise Terrain Generator? 2 Answers
Aligning simplex noise-generated terrain 3 Answers
Multiple Cars not working 1 Answer
 koobas.hobune.stream
koobas.hobune.stream 
                       
                
                       
			     
			 
                