Enrique Mendez, 9, and his older brother Cristian, 11, sorted through a plastic bin of toys in their New Jersey home. “I want to play with the wrestling guys,” said Enrique in a voice not quite his own, but pretty close.
Enrique has Down syndrome and speech apraxia, which means that he cannot speak, aside from a few grunts and “Ma” in the word “Mama.” He was able to speak to his brother, though, with an iPad loaded with the latest version of a widely used text-to-speech application, Proloquo2Go.
“The voice now matches the boy,” said John Mendez, Enrique’s father.
Until recently, devices that help children like Enrique speak used modified adult voices. The effect can be startling to those listening because it doesn’t sound like a child’s voice. Most existing children’s voices sound “like adults on helium,” said David Niemeijer, chief executive and lead developer at AssistiveWare, which developed the software Enrique tested.
AssistiveWare and its partner, Acapela Group, developed the next version, Proloquo2Go 2.1, which features two children’s voices — known as Josh and Ella — actually recorded by children. The $190 application can be bought on iTunes Wednesday, but people who already own the app can add the latest voices at no charge.
Few, if any, other companies offer true children’s voices, largely because of the challenges of recording children. The average 10-year-old cannot spend hundreds of hours in a sound booth recording the library of phrases needed to create a synthetic child’s voice.
Sound engineering can manipulate adult voices, adding filters that adjust for the higher pitch of a child’s voice, for example. But without a baseline recording, the voices to date have lacked the natural sound of a child’s voice. With little competitive pressure to replicate children’s voices, most companies decided children could get by with the altered adult voices.
The release of Proloquo2Go’s boy and girl voices — the company also has two other children’s voices with a British accent for that market — is an indicator of new progress in the decades-old text-to-speech industry.
The progress is, in part, a side effect of the adoption of automated voices in everything from credit card company service lines to the grocery store checkout kiosk. But faster computer processors with more memory have empowered sound engineers to make artificial voices sound more human. Many of the larger voice companies like Nuance, in Massachusetts, and Ivona, in Poland, now offer voices in multiple languages and accents.
Proloquo2Go, which runs on Apple’s mobile devices, is used by tens of thousands of children with disabilities like autism and cerebral palsy. The company estimates that 80 percent of its users are under 18 years old and 60 percent are under 11. The new voices are for children ages 6 to 14. In December, Acapela Group will begin licensing the voices to companies for use on other devices.
Proloquo to go “can be a good fit for some people, but not for everyone,” said Janice C. Light, a professor of communication sciences and disorders at Pennsylvania State University.
Said Mr. Niemeijer, the AssistiveWare chief: “You definitely need to look at the child and think about what would be a good situation. A degree of assessment is definitely necessary because the parents often just go out and buy the device and it doesn’t work out. Parents often have too high hopes.”
When Shanay Finney, 30, learned that her 10-year-old autistic son Dahmier might be able to have an age-appropriate voice, she reacted with mixed emotions. On one hand, when he goes to the store and interacts with people in public, the sound of his voice would be more normal, but on the other hand, all the other little boys using Proloquo2Go would have the same voice.
“I’m going to keep it real,” she said. “It’s not my son’s voice. But I know it might help.”
The software cost AssistiveWare about $100,000 to develop. During the recording sessions for Proloquo2Go 2.1, audio engineers collected several thousand phrases and hundreds of words, including profanity, which Mr. Niemeijer believes is a step toward empowering children who cannot speak, giving them the same vocabulary as their peers.
From this bank of words, the application can synthesize any word in the English language. For example the word “impressive” is stitched together from the words impossible, president and detective.
Most text-to-speech devices do give users the ability to say almost anything, and many allow users to choose whether they want to sound happy, angry or sad. The challenge facing the industry, whose biennial conference starts Saturday in Pittsburgh, is how to develop text-to-speech technologies that can predict the emotion, or tone, a person might want to use in a given situation.
Many in the industry and other experts in the field agree that a synthetic voice, even one that expresses basic emotions, is barely adequate to allow someone with a speech disability to speak normally, let alone have a sense of individuality.
“When we’re in conversation, we use tone of voice for all kinds of things, to express respect, gratitude, to influence the way a conversation goes backwards and forwards,” said Graham Pullin, the designer behind a pilot project called Speech Hedge, which is aimed at molding the tone of a person’s voice.
“You often can’t really chip in sharp/sarcastic comments,” wrote Martin Pistorius, a 36-year-old Web developer and author, in an e-mail. He lost his voice after contracting meningitis when he was 12 and has been using text-to-speech technology for 10 years. “By the time you’ve composed it, the moment has gone so it wouldn’t really be funny or appropriate any more.”
“I’m pretty quick at getting my message out, but even so I still can’t keep up with the pace of normal conversation,” he wrote.
By EMILY B. HAGER / The New York Times – Original Article