GetSpectrumData or equivalent in non-realtime mode?

Question

Hi all!

I'm attempting to write a non-realtime renderer for audio visualizers in Unity.

Grabbing the frames seems okay so far, but I can't get GetSpectrumData to operate properly outside of realtime. Whether or not called from AudioListener or from AudioSource, it seems to operate on the audio stream itself, instead of the audio file's data.

That means that when the renderer is operating in non-realtime mode - which it needs to do in order to capture frames at high resolution - the stream fails to be a reliable source of spectrum data anymore, because it's pausing and unpausing and therefore creating gaps in the analysis data, as if it were trying to analyze an audio file that itself contained bits of silence in between every bit of audio.

The effect on the vis can be seen here - the faulty non-realtime vis is on the left, clearly containing a peak that fails to "fire" compared to the realtime vis on the right.

alt text

I get the sense from reading previous posts that GetSpectrumData may not be able to work in non-realtime-mode and it may be necessary to "roll my own" spectrum preprocessor in order to get accurate data; is this the case?

(I'm tagging @jkeogh1413 in this post, clearly one of the most experienced audio coders in Unity based on some previous responses!)

Thanks for any help that the community can provide!

--CaliCoastReplay

Answer 1

Best Answer

Answer by jkeogh1413 · May 20, 2019 at 03:25 PM

Thanks for the tag.

I wrote a blog about how to do preprocessed spectrum analysis in Unity https://medium.com/giant-scam/algorithmic-beat-mapping-in-unity-preprocessed-audio-analysis-d41c339c135a

I'd recommend reading that first, and then taking a look at my open source implementation that I link in the Outro of the blog. You may not care as much about the spectral flux part if you're not trying to pick out distinct beats, but I think you'll find what you're looking for.

Let me know if you have any questions!

Add comment · Show 4 · Share

CaliCoastReplay · May 21, 2019 at 02:21 AM 0

Share

Thanks, this is really promising! :)

CaliCoastReplay · Jul 13, 2019 at 11:43 AM 0

Share

Hi jkeogh!

I haven't been able to get time away from work to analyze this until just now, but I do have two questions:

How can I use your code to simply get a float[1024] array of FFT values at any given point in time for the song that roughly correspond to the ones I'd get from AudioSource.GetSpectrumData? I'm lost by the way you don't pass the current time for the song until you actually do your spectral flux analysis.
How long is it before the background thread usually starts being able to return this sample data - is it instantaneous or are there frames near the start of the program's execution where the FFT won't be available yet?

Thanks so much and sorry for the delay! Your work in this space is far ahead of anyone else's I've met!

CaliCoastReplay · Jul 13, 2019 at 12:32 PM 0

Share

Oh WOW. I think I actually got it on my own - or part of it.

I actually converted it to a non-threaded version - since it's not intended for realtime playback it's less important if it has to do some heavy processing at the start. I moved the conversion of preProcessedSamples to Start() and then added a GetAudioSpectrum() function that does a single FFT ins$$anonymous$$d of iterations according to the AudioSource.time.

Performance and spectrum fidelity are actually surprisingly good considering how naive I'm sure my method is! However I'm still not sure I'm doing it 100% right so any advice you still want to give would be great! I could post the code I'm messing with and we could collaborate!

Thanks for your pioneering work in this space!

jkeogh1413 CaliCoastReplay · Jul 25, 2019 at 11:50 PM 0

Share

Hey @CaliCoastReplay. Sorry for the delay.

Glad to hear you got something working. Figured I'd still chime in with a few tips you may already know. One thing that caught my eye is your use of AudioSource.time. If you're relying on AudioSource.time, which is a potentially buffered value relative to a playing AudioSource, you're getting dangerously close to what Unity's GetSpectrumData does under the hood. I know you're not interested in real-time analysis via Unity's helpers, but make sure you read the previous section of my blog to see what all you can achieve within the Unity library.

$$anonymous$$y approach to analyzing data for a time range in a song is to have a very good understanding of what time range an individual sample is representing, and applying that in the offset parameter of GetData.

Use the math in the Navigating the Spectral Flux Output section, but don't think of the chunking as only a part of the Spectral Flux setup. We chunk by 1024 in that example because our Spectral Flux Analysis is combining 1024 samples at a time to represent a small time range in the song. We're essentially trying to do the same thing here.

Example: 120 second song (AudioClip.length) 44100Hz sample rate (AudioClip.frequency) 5292000 (120 * 44100) mono samples (AudioClip.samples) Note: In GetData you'll still want to get your required number of samples from your time range multiplied by AudioClip.channels to make sure you get the full stereo data range, and then do your combination to mono.

1 / 44100 = 0.000022675736961 120 / 5292000 = 0.000022675736961

So we can be pretty confident that each combined mono sample is going to represent 0.000022675736961 seconds in your audio clip.

Let's say you want to analyze data at song time 60 (halfway through the song) We just need to get the data offset that the stereo samples begin, then grab some meaningful chunk of samples to send to our FFT.

60 / 0.000022675736961 = ~2646000 (round down) Our total sample size (5292000 / 2) = 2646000

So all I'm doing here is building confidence that we got our math right.

So if you call GetData with a 1024 buffer at offset 2646000, you should get the next 1024 stereo samples starting at 60 seconds into the song. Combine the stereo samples to mono to get 512 samples, giving you the next ~0.0116 (512 * 0.000022675736961) seconds of data. So even at a 1024 (combined to 512) granularity, you're still analyzing every hundredth of a second, which is pretty solid and should be very performant.

Replace 60 with any time in the song, and you should have all the tools you need to perform FFT on any small time range in the song without the overheard of doing all of the analysis in a thread upfront.

Hope this is helpful! Let me know how it goes.

Answer 2

Answer by Bunny83 · May 20, 2019 at 12:39 PM

Have a look at this question which asks essentially the same thing

Add comment · Show 1 · Share

CaliCoastReplay · May 21, 2019 at 02:21 AM 0

Share

$$anonymous$$uch appreciated!

GetSpectrumData or equivalent in non-realtime mode?

2 Replies

Your answer

Follow this Question

Related Questions