Understanding how people group information over time: A new technique

The following is a UX-friendly summary of Paine, L., & Gilden, D. (2013). A class of temporal boundaries derived by quantifying the sense of separation. Journal of Experimental Psychology: Human Perception & Performance, 39 (6), 1581-1597. (copy of record)

The image below demonstrates one of the principles of visual grouping – similarity of color.  It is one of several principles designers wield to help users make sense of visual scenes.  In using these principles, the designer is essentially telling users how to determine “what goes with what” in the image.  The processes by which we determine which elements are part of the same “thing” are so fundamental that we’re hardly aware of them; and yet, they are powerful forces that structure everything we experience.  To convince yourself of this, try to see the image below as three rows instead of two (or four) columns.  It isn’t easy.

Red and black dots

It is precisely because these perceptual processes are so powerful that it is essential to understand them, and it is partly for this reason that perceptual grouping is one of the oldest problems in psychology. To date, there has been a marked bias in research toward studying static visual scenes (understandably, because it’s a lot easier to study something that stays still). But we don’t only group things in static visual space. We also group sounds and other stimuli that appear together over time (“temporal grouping”). Our work has brought a new rigor to the study of temporal grouping by identifying unique psychological signatures that people only display when they are experiencing boundaries between temporal groups.

In this summary, I’ll provide background on what temporal grouping is, why it matters for users, and what we’ve done to measure it more effectively.

What is temporal grouping?
On first reflection, it might not be clear that people actually do group information in time.  However, a simple thought experiment demonstrates that temporal grouping does, in fact, play an important role in perception.

Imagine you are listening to a song (say, “Rainbow Connection”). When you hear the first line of the song (“Why are there so many songs about rainbows?”), a clear melody is formed. This melody is a perceptual group, much like the column of red dots in the image above.

Why are there so many songs about rainbows?

But imagine what happens when this melody is spaced out, with the notes appearing ludicrously far apart. Instead of listening to the melody all at once, you hear “Why” at 1:00, then ten minutes later, “are”, followed by “there” at 1:20, all the way through “-bows” nearly two hours after the original note.  While you may be familiar enough with the song to complete the melody in your mind based on the cue of a single note, there is no way to actually experience the eleven separated notes as one coherent melody.

Why (1:00)

are (1:10)

there (1:20)

rain- (2:30)

-bows… (2:40)

While this may seem obvious, it actually demonstrates something very interesting about the way we perceive temporally spaced information.  When information is presented close together in time (the notes of a song, the words of a sentence, or even visual sequences), we generally have no trouble weaving these elements together into something coherent.  Yet there are some cues in the world that signal to our minds that we no longer need to worry about grouping temporally spaced information together.  The passage of a large amount of time (as in our “Rainbow Connection” thought experiment) is one of these, but there are also others.  These cues are what we aimed to study.

Why perceptual grouping matters
Grouping is how we make sense of an extremely complex world of perceptual information.  When perceptual grouping fails, we cannot correctly interact with the world.  For instance, if we incorrectly perceive the caterpillar in the photo below as part of the leaf it’s sitting on, we’re in for an unpleasant surprise when we try to touch the leaf.

camouflaged caterpillar
Photo by Wohin Auswandern

Appropriate grouping of information in time is also critical.  Real-time content delivery is particularly vulnerable to temporal grouping errors.  It’s like bad lip-syncing in a foreign language film — if your starts and stops (e.g., lip motion and audio onset) are poorly aligned, the result appears comical or worse, downright confusing.

I once worked on an app that automatically refreshed a content feed (think Twitter) without any visual cues about where previous content had gone.  In fact it had just moved down the page, but there was no animation to indicate this.  Instead users perceived a flash, and then what they had been looking at seemed to have disappeared.  In this case the flash made users think that something totally new was happening, and they failed to perceive the subsequent screen as related to the one they’d been viewing before.

Twitter handles this problem more effectively, using a subtle sliding action with a notification of new tweets that can be unfolded at will (see below).  This allows users to maintain their sense of continuity between what they were looking at and new content.  No spurious segmentation occurs.

Twitter feed

When it is not clear to users what information is old and what is new, serious usability problems can result.  Making use of effective boundary cues helps users sort information into the correct categories (i.e., old or new), which makes for a more coherent experience.

Identifying temporal boundaries
While the experience of temporal grouping is natural and effortless, obtaining reliable, objective measures of exactly when people experience segmentation in time turns out to be challenging.  Previous researchers have largely sidestepped the issue of objectivity by simply asking people to tell them when they experience one event ending and another beginning.  Yet this approach raises more questions, as self-reported boundaries may change depending upon how stimuli are contextualized for participants.

We wanted to develop a technique that would allow perceived boundaries (i.e., starts and stops) to emerge naturally, without people having to tell us anything.  To do this, we invented a special task that uses slight variations in people’s behavior to reveal what they are experiencing at different points in time.  The participants’ task is simple: all they have to do is classify shapes and sounds (the stimuli) into one of two categories as quickly as possible (e.g., red vs. blue).  These stimuli are presented in a sequence that incorporates whatever boundary cues we are curious about.  We can then look at participants’ patterns of responses and identify different signals depending upon whether they were experiencing something continuous (a unified group) or segmented (a boundary).  Details of the task may be found in our journal article.

We tested many types of boundary cues.  But one of our most interesting findings resulted when we tested depth cue boundaries (that is, cues for grouping information into foreground and background planes).  This is something computer interface designers should care about, as designers frequently use these types of cues in an effort to differentiate information (for instance, with overlapping windows, as below).  In our study, we wanted to discover what kinds of depth cues are most effective at helping people segment information.

several windows
Image by Jonas Söderström

One way people determine what’s closer to us is by using the cue of overlap.  Our first experiment (Experiment 6a in the illustration below) tested people’s response to overlapping by having one object slide up and overlap another object.  What we found was that this was not an effective way to promote temporal segmentation — on its own, the overlap cue is too weak.

Illustrations of occlusion and shaded stimuli

We then modified the experiment to include another depth cue — a drop shadow (Experiment 6b).  The experiment was identical to the previous one in every other way.  With the drop shadow included, however, participants’ performance changed dramatically.  They showed strong, consistent segmentation signatures every time the new object appeared onscreen, which reveals that users experienced a clear distinction between “old” and “new.”

What this means is that drop shadows seem to be stronger separation cues than overlapping.  It appears that drop shadows naturally segment information flow within our perceptual systems, but that overlapping cues are significantly less potent.  From a design perspective, it shows us that drop shadows are a powerful tool to psychologically separate concepts or processes.  By contrast, if a designer wishes to present new information without significantly disrupting users’ perceptual flow, then visual overlap without shading may be used to good effect.

Future directions
At present, we have tested this technique using a range of potential grouping cues drawn from both visual and auditory domains.  We have identified several effective grouping cues, including rhythmic separation (i.e., musical rests), shifts in physical space, changes in musical pitch, flipping, and even conceptual shifts in object identity (e.g., moving from a triangle to a square).  This work represents a significant step forward in our ability to understand how people group information over time.  A next step for HCI researchers is to apply the technique in more practical scenarios, such as those involving interface elements.