Tag Archives: text to speech processing

sound file spectrum 2

A.I. and the Mastery of Spoken Language

The question isn’t just whether we are capable of making simulations of human speech but, rather, if bots can replicate the singular mind that gives form to all speech.

In Steven Spielberg’s dystopian film, A. I. Artificial Intelligence, a software designer played by William Hurt explains to a group of younger colleagues that it may be possible to make a robot that can love. He imagines a machine that can learn and use the language of “feelings.” The full design would create a “mecha”—a mechanized robot–nearly indistinguishable from a person. His goal in the short term was to make a test-case of a young boy who could be a replacement for a couple grieving their own child’s extended coma.

The film throws out a lot to consider. There are the stunning Spielberg effects of New York City drowning in ice and water several decades in to the future. But the core focus of the film is the experiment of creating a lifelike robot that could be something more than a “supertoy.”  As the story unfolds, it touches on the familiar subject of the Turing Test: the long-standing challenge to make language-based artificial intelligence that is good enough to be indistinguishable from the real thing.

Should we become attached to a machine packaged as one of us? Even without any intent to deceive, can spoken language be refined with algorithms to leap over the usual trip wires of learning a complex grammar, syntax and vocabulary?  It takes humans years to master their own language.

The long first act of the film lets us see an 11-year old Haley Joel Osment as “David,” effectively ingratiating himself to the Swinton family.  In my classes pondering the effects of A.I., this first segment was enough  to stop the film and ask members what seemed plausible and what looked like wild science fiction. I always hoped to encourage the view that no “bot” could converse in ordinary language with the ease and fluency of a normal kid.  That was my bias, but time has proven me wrong. If anything, David’s reactions were a bit too stiff to reflect the loquacious chatter bots around today. Using Siri, Alexa or IBM’s Watson as simple reference points, it is clear that we now have computer- generated language that has mostly mastered the challenges of formulating everyday speech. There’s no question current examples of synthetic varieties are remarkable.

Here’s an example you can try. I routinely have these short essays “read” back to me by Microsoft Word’s “Read Aloud” bot, which comes in the form of a younger male or female voice that can be activated from the “review” section in the top ribbon. Not having an editor, it helps to hear what I’ve written, often letting me hear garbled prose that my eyes have missed. I recall the first version of this addition to Word was pretty choppy: words piled on words without much of attention to their  intonation, or how they might fit within the arc of a complete sentence. Now the application reads with pauses and inflictions that mostly sound right, especially within the narrower realms of word usage focused on formal rather than idiomatic English.  Here is the second paragraph of this piece as read back to me via this Word function:

Of course, language “means” when it is received and interpreted by a person.  An individual has what artificial Intelligence does not: a personality, likes and dislikes, and a biography tied to a life cycle. Personality develops over time and shapes our intentions. It creates chapters of detail revealing our social and chronological histories as biological creatures. A key question isn’t whether we are capable of making simulations of human speech. And that begs an even bigger question about whether bots can replicate the unique mind within each of us that gives form to human speech.

Even tied to advanced machine-learning software, chatterbots easily use similarity to falsely suggest authenticity. And there’s the rub. Generating speech that implies preferences, complex feelings or emotions makes sense only when there is an implied “I.” For lack of a better word, with Siri or Watson there is no kindred soul at home. The language of a bot is a simulacrum: a copy of a natural artifact, but not a natural artifact itself.

Even so, we should celebrate what we have: machines that can verbalize fluently and–with complex algorithms–might speak to our own unique interests.