Posted: September 21, 2016
Many people with Cerebral Palsy have associated speech challenges, sometimes requiring computer assistance in order to communicate by voice. For those with severe enough difficulties, text-to-speech computer assistance can change lives and is a revolutionary communication enabler. However, the number of voice options from which to choose are quite small, and the result is a kind of “one voice fits all” scenario where little girls and grown men alike share an audibly indistinguishable voice. This describes the unfortunate mismatch between person and voice that many who use this technology must tolerate.
For example, the well-known Stephen Hawking's 'computer' voice, while practical and suitable for effective communication, is just not able to harmonize with the persona of, say, a younger boy or girl. Though the very sound of a voice—its qualities, inflections, characteristics—is a unique expression of identity. So it may be asked: why aren’t efforts put toward creating ways to give a more personalized voice for those who rely on computer assisted speech?
That’s exactly what Rupal Patel is doing. Patel is a professor at Northeastern University, a speech scientist, and a pioneer committed to changing the lives of those with speech challenges. Patel developed the groundbreaking technology behind VocaliD, short for Vocal Identity, a movement with a simple yet profound goal: to give one-of-a-kind, customized, non-generic voices to the voiceless. You can watch Patel’s 12 minute TED Talk here, but we’ll summarize the ethos and workings of VocaliD in what follows.
The general mechanics of engineering a voice that reflects the personality of the individual using it are simpler than what might be expected. As reported by Patel, 2.5 million people in the US alone are unable to speak, and many use computer devices that offer generic voice options. To create a customized voice, all the subject must be able to utter are basic vowel sounds—something the majority of speech challenged people can do, even those born with such challenges. Basic vowel sounds are produced by vocal chords in the voice-box (or larynx), and sounds produced here constitute the very source of speech, including the idiosyncrasies that make it unique. Because of this, recording even the most basic of sounds (like vowel sounds) from the speech-impaired subject is sufficient to create a working voice that’s true to their identity.
These basic vowel sounds, which harbor the subject’s own tenor and personality, are ultimately blended with another voice—the donor voice. The donor voice provides the ‘filter’ that those with speech challenges may lack, offering finer-grain articulation. The donor voice can come from anyone without speech impairments, but ideally it is someone with a similar enough voice to the subject. By the donor voice uttering hundreds to thousands of phrases over the course of a few hours like “I love chocolate,” “I love to sleep,” and “things happen in pairs,” computers record all the combinations of sounds that occur in language, creating a database voice bank.
From this voice bank, any new utterance can be created; much like with all 26 letters of the alphabet, any word can be created. The final product is a blended voice that is distinctly unique, being just as articulate as the surrogate talker, yet based on the subject’s voice so as to express his or her own identity.
While text-to-speech technology is a revolutionary conversational tool, its voice options are surprisingly small, leading to mismatches between person and voice. But with help from Patel and VocaliD, we can start expecting to see big changes, and you could be a part of them. If you’re curious about donating the sound of your voice to VocaliD’s voice bank, click here. Patel’s larger dream is to have a worldwide voice drive, so that the speech challenged can easily sift through potential matches to create their vocal identity.
We’ll leave you with this: the reason why Patel is so passionate about her work can be summed up by the following quote she cited from the poet Longfellow—
“The human voice is the organ of the soul.”