- Home /
Compare two Audio Clips
The title explains its self, but here is the context; I am creating a voice recognition program. One key element is to compare two audio clips together. How would I go about doing this?
Thanks before hand!
Voice recognition is incredibly complicated.
You can generally figure out if two sound files are exactly the same or pretty similar, but checking if a spoken line is similar to the same line spoken by somebody else, or to a string of text? $$anonymous$$icrosoft has probably poured millions of dollars into the problem, and the $$anonymous$$inect's voice recognition is still not ideal.
I'm not trying to put you down, but a voice recognition program is no small feat.
As Baste stated you are pretty much asking how to do the most fundamental part of Voice Recognition and I hope you have the best of luck on this but I don't think you are going to get exactly what you want from unity that simple
Your best bet would be to spend a lot of time studying the subject if you truly feel it is needed in your game, from what I know unity does not support anything like that and you would have to do a lot of work to achieve it
An idea would be to see if you can find a way to use microsofts voice recognition in the way that you can tell it to write things in a notepad? look at it like that so that if the voice input = Hold then it runs a script
First, you need to define what "compare two audio clips" mean. Do yo mean de file itself? The content? Compare them based on what? (there are many parameters to measure in an audio clip)
If it's just the file, go compare them as regular files. If it's the content, it is a very complicated thing (as others have said before). I suggest you use some platform specific tool. Windows itself comes with voice recognition (I haven't tried other platforms). Google it.
Answer by VesuvianPrime · Sep 17, 2014 at 01:19 PM
If we're talking about the waveform data there are different metrics to consider:
Trivial:
The length of the audio
The number of channels
The sample width
The framerate
Medium:
The amplitudes
Hard:
The tempo
The frequencies
By the looks of it, Unity actually does a lot of the work for you here.
AudioClip seems to cover the trivial items, while AudioSource provides FFT functionality (for frequency analysis).
For amplitudes you can simply sum the deltas between 2 curves at each sample, though you might want to do some volume normalization first if you only care about the shape.
Tempo is odd, you can probably ignore it, but just in case: Beat Detection
The more of these metrics you can calculate, the more accurate your "similarity" metric is going to be.
Thanks, didn't realize I could compair audio with these variables.
Dear @VesuvianPrime
Your answer is convincing but looked complicated to me to achieve. But in my case could be simpler to solve. I am trying to recognize pause in the speech. and filler words such as "urm" "uh" "err"
Does using fft comparison will be good enough to do the trick?
So sorry for being late in the discussion. Just recently embark on this project.
Your answer
Follow this Question
Related Questions
Compare 2 Recorded Voices? 1 Answer
Multiple Cars not working 1 Answer
Real Voice Print Authentication 0 Answers
Distribute terrain in zones 3 Answers