Google's DeepMind Invents New Way of Synthesising Speech

By Gary Cutlack on at

The people behind Google's DeepMind machine learning experiment say they've come up with a completely new way of synthesising artificial speech, binning the traditional piecing together techniques in favour of a method that uses real waveforms to create sounds.

The result is another thing-that-sounds-cool; the Wavenet, which Google describes as being a "deep generative model of raw audio waveforms" that can mimic any human voice and results in a more natural sound. As ever with these things, we will hold judgement until hearing it do a Geordie accent or successfully interact with a Glaswegian.

And it can sing. DeepMind says: "WaveNet changes this paradigm [reliance on existing sample databases] by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music."

If you scroll down far enough through the DeepMind announcement there are some nice embedded examples that compare existing synthesis techniques to the new WaveNet approach, and yes, the latter's much bouncier and more more believably realistic. [DeepMind]

Want more updates from Gizmodo UK? Make sure to check out our @GizmodoUK Twitter feed, and our Facebook page.