But building voice technology on a 26-year-old corpus inevitably lays a foundation for misunderstanding. English is professional currency in the linguistic marketplace, but numerous speakers learn it as a second, third, or fourth language. Gavaldà likens the process to drug trials. “It may have been tried in a hundred patients, [but] for a narrow demographic,” he tells me. “You try to extrapolate that to the general population, the dosage may be incorrect.”
Seems like a huge problem of bias if voice systems are trained on such an old and narrow set of data.