How to get consecutive voice clips to sound natural

Question

In the game I'm working on I have to play several short consecutive voice-clips to form a complete sentence. Example (each [] bracket is a different voice clip):

 [Bob here,] [we're at]  [some town] [and are on our way to] [some city].

Stitching together different voice-clips like this makes it sound stilted and disconnected. This is because there are unnatural pauses when switching clips, and the pitch and tone of the speaker changes.

My current efforts include two methods for removing the unnatural pauses: 1. starting the next clip early if a silence is detected at the end of the preceding clip 2. skipping the first few milliseconds of the new clip up to the first detected 'sound'.

These work OK at removing the unnatural pausimh, but detecting what 'silence' is is difficult, especially when dealing with multiple voice-actors and microphones.

How could I make stitching together voice-clips sound more natural? Any advice would be appreciated. This has to be done in real-time inside the game (I'm using Unity), and can't be pre-processed or done ahead of time.

Answer 1

Answer by Mouton · Sep 26, 2019 at 09:35 PM

You have to look for Markov Chain Models (Wikipedia: https://en.wikipedia.org/wiki/Markov_chain) and Hidden Markov Model. Since I don't have enough knowledge to resume it cleverly, I can only forward you to this article: https://www.freecodecamp.org/news/an-introduction-to-part-of-speech-tagging-and-the-hidden-markov-model-953d45338f24/

Add comment · Share

How to get consecutive voice clips to sound natural

1 Reply

Your answer

Follow this Question

Related Questions