Holding a List in Mind

Extracting a “that’s it” response from a user listening to a list of options is, although an easy task, not a very common one for IVR applications. More often, the user is expected to make a decision—not just select from a list. And decision-making uses very different psychological processes. The cognitive load for such a task is illustrated in the following figure:

Imagine that you are asked to decide which of several items is the one you prefer at a given moment. For example, you want an ice cream cone, and you are presented with the question, “chocolate, vanilla, or strawberry?” It’s a 3-way list, right? But it is not as though you have a pre-conceived template that will light up with a “that’s it” when you hear one of the flavors. Rather, you must integrate the options and hold all of them in mind while “comparing” their properties against your internal desire.

Look at the illustration to see how much more complex that task is.

Now there are “four of you,” and attention must be distributed among all four. Part of you, upper right, is anticipating what might come next. That’s the “you” that the discourse marker “or” in the question, “chocolate, vanilla, or strawberry?” is designed to serve. Another part of you in the here and now is attending to sensory memory—perceiving and interpreting the current option. A third part of you, upper left, is rehearsing the list in order to hold it all in short term memory at once. And finally, lower left, a part of you is retrieving experience memories to make a decision about your desire—in effect a “matching target.”

This fourth “you” is especially interesting. As you hear the choices, you take a moment to “imagine” (remember) what the first flavor, chocolate, tastes and feels like. Then you create the same fantasy about vanilla, and finally strawberry. You are really constructing a “taste test” for each flavor—from memory of course—and then comparing each test to make a final decision about which flavor you want right now.

Antonio Damasio tells us:

“… the images over which we reason (images of specific objects, actions, and relational schemas; of words which help translate the latter into language form) not only must be “in focus”—something achieved by attention—but also must be “held active in mind”—something achieved by high-order working memory.”

This is the quintessential difference between recognizing and selecting a target (“that’s it”) on the one hand, versus making a decision (“this is what I want”) as discussed here. If you must decide based on several options, then you must accomplish the following:

  • Listen to the whole list and retain it;
  • Rehearse each element as the list progresses to ensure nothing is forgotten;
  • Compare each element in the list with your need (perhaps multiple times);
  • Make a decision that discriminates one list element from the others; and,
  • Speak the chosen item (or press its corresponding key).

All of these actions must occur during and immediately after the list presentation. This is a case in which barge-in makes a smaller contribution. You need not interrupt the list, because you have to hold the entire list in mind before you can compare the items and make a choice.

Users confronted with this challenge take longer to decide, make more mistakes, and more often must hear the list again. The problem grows with the length of the list. The design principle that emerges from this observation is as follows:

The maximum length of a list is shorter for lists that require “deciding” as a user action than it is for lists that only require “recognizing and selecting.” Rule of thumb: three or (at most) four. What’s more, timeout values need to be longer at the end of decision-list presentation than for the corresponding “that’s it” tasks.