I Can Just See It Part C

Referring now to ASR—when people “imagine” what a conversation with technology would be like, they do what Sandra and Dale did. They collapse the image into a visual spatial construction and then imbue it with feeling. This is why speech interfaces seem—on the surface at least—as though they should be natural and delightful. When we develop a “vision” of speech we remove time from the introspection, and instead respond to the single state that is the end result or product of moving through the speech dialogue.

Like Dale, in other words, we cut to the chase. Rather than endure step by step the many moments of tension, confusion, wrong turns, and repetitions that characterize real human speech in real live dialogues, we jump to the end state—the affective state in which the conversation is now successfully completed and we are enjoying the afterglow of the sequence. This is the vision that preoccupies true believers in the speech community.

We should be careful when a design idea for IVR—or for that matter any proposed speech dialogue—appears all at once as a feeling. Because what we are imagining is the end result or product of the dialogue, not the sequence of speech acts that lead to it. A speech dialogue, like music or theater, unfolds over time, and it is time that complicates the static afterglow of user delight sought so vigorously by speech designers.

In fact, end users exist within a different psychological set and a different psychological setting than does the designer, the call center manager, the speech visionary, or the focus group participant. What’s more, there are many paths through the dialogue—paths not experienced nor even imagined by the designer. And each unique set and setting—what we can think of as the starting state of a given user as she enters the temporal sequence of the conversation—leads to and is influenced by the unique path through the dialogue taken by the individual caller.

What emerges at the other end is not one, but a huge array of possible internal states that can reside in the user’s mind and body at the end of the sequence. And not all of those states create the afterglow that we envision when we picture success. Indeed, most of them are troubling and discordant. All users who failed at their task, for example, and many successful users who experienced extended periods of tension or uncertainty may be left with an end state that is memorable but negative.

And—if branding is successful—callers associate that state with the enterprise. Careful what you wish for!