- Home /
ComputeShader version of Physics.Raycast
Hi all,
I am new on ComputeShader. In my application, I wrote a script that contains many may (>10,000) times of Physics.Raycast However, it is well-known that Physics.Raycast is CPU, and thus slow.
I look into google, and found that ComputeShader Unity is something similar to CUDA And I would like to write a ComputeShader version of Physics.Raycast.
I will be glad if any friends here can give me advise or reference about this. Thanks a lot! and have a nice day.
William
Answer by mysteryDate · Mar 11, 2015 at 06:03 PM
This is super possible. It's actually something I have done, it is NOT easy, however. I would greatly recommend this blog:
http://scrawkblog.com/2014/06/24/directcompute-tutorial-for-unity-introduction/
And this post:
http://kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty.html
Unfortunately, compute shaders are a relatively new technique and resources can be hard to come by.
As far as compute shaders speeding up your raycast, the answer, of course, is "it depends." From what you describe you have many rays, and hopefully few objects in your scene. So you'll likely want to parallelize by having a single thread per ray. GPUs are capable of many, many threads, but as @Benproductions1 said, they're not actually faster (and are infact quite a bit slower) than CPU threads. So if each thread has to look for an intersection with 1000s of objects, it's still going to be slow. One way that Unity (along with other engines) gets around this problem on the CPU is through octree culling:
http://en.wikipedia.org/wiki/Octree
...but I'm getting ahead of myself. Assuming that you don't have too many objects in your scene, here's a simple raycast compute shader:
struct Ray
{
float3 position;
float3 direction;
};
StructuredBuffer<Ray> rays;
// Three vertices define a triangle in space
float3 vertA;
float3 vertB;
float3 vertC;
[numthreads(32,1,1)]
// a 32 by 1 thread group, generally you want to fill up thread groups
// which are either 32 or 64 threads wide, GPU dependent
void RayCast (uint3 id : SV_DispatchThreadID)
{
// Our Ray
float3 pos = rays[id.x].position;
float3 dir = rays[id.x].direction;
// The normal vector of the plane defined by the triangle
float3 norm = normalize(cross(vertB - vertA, vertC - vertA));
// The distance of the ray to an intersection with the plane
// This is in units relative to the length of ray.direction
float k = dot(vertA - pos, norm) / dot(dir, norm);
// The point in space were the ray intersects the (infinite) plane
float3 I = pos+ k*dir;
// Convert to barycentric coordinates
// This will find if the intersection is actually within the triangle
float triangleArea = dot(norm, cross(vertB - vertA,vertC - vertA));
float areaIBC = dot(norm, cross(vertB - I, vertC - I));
float baryA = areaIBC / triangleArea;
float areaICA = dot(norm, cross(vertC - I, vertA - I));
float baryB = areaICA / triangleArea;
float baryC = 1 - baryA - baryB;
if(baryA > 0 && baryB > 0 && baryC > 0 && k >= 0) {
// The ray intersects this triangle
}
}
This would have to run for EVERY triangle that the ray potentially intersects, either with a two-dimensional compute shader (num rays x num triangles), or a for loop within the compute shader. Either way, you're looking at some serious computations.
Now, this doesn't get you 100% of the way there. You have to figure out what the hell you're going to return, and for that you're going to run into race conditions. Yup, there's gonna be some race conditions, good luck!
Answer by Benproductions1 · Apr 22, 2014 at 11:20 AM
Physics.Raycast is CPU, and thus slow
Just to be clear, a GPU is not fast, it's merely concurrent. Think of a GPU as having about 32, very old (ie. slow), very small CPUs. Unless you're doing something like 1k+ (arbitrary) raycasts per frame, using the GPU will probably be slower.
Before you ask about how you can implement Physics.Raycast
on the GPU, you'd be better of re-implementing it in a normal script first, just so you can understand the amount of work behind it.
To gain any sort of speed boost, you will quite literally have to rebuild a large section of Unity's physics engine, PhysX, on the GPU. This includes spacial trees and the like. You don't have access to any physics or other such API functions from any type of shader, so unless you're either ready to rebuild the PhysX collision system or willing to settle for a horridly slow GPU implementation, I suggest you stick with what you currently have.
Answer by williamlai3a · Apr 22, 2014 at 01:42 PM
Thank you for your answer. That's a horrible news.
And as I mentioned, I have to perform more than 10K raycasting (probably increasing to obtain a better result in my application), and that is from each light source that I have on scene.
And of course I understand that single GPU vs single CPU is not necessarily faster. What I understand is exactly about the parallelism, since each raycast does not differ from each other but the initial shooting direction.
So how may I improve this if I do not have the help from computeShader? Any advise? Thanks.
Answer by Roni92 · Aug 20, 2015 at 10:27 AM
Hello, how can I call this RayCast method with appropriate arguments, and how to retrieve bool if raycast hitted its target?