Biological motion

Categorizing identity from biological motion of eyes

Aya Shirama, Ai Koizumi and Norimichi Kitagawa

Human body movements can provide rich cues to information about the person doing the movements, such as identity, gender, and emotions. Despite the fact that eye movements are also a kind of biological motion and our common belief that the eyes are the windows of our souls, it has not been clarified whether eye movements can convey one's individual character. We recorded the binocular eye movements of individuals while they made a short speech. A discriminant analysis of the physical parameters of the eye movements showed that motion information of the eyes conveys enough cues to mathematically discriminate the individuals. To examine whether we can know a speaker's identity from eye movements per se, we created computer-animated eyeballs from the recorded eye movement. Observers viewed the animations from four speakers and were asked to sort them into four groups on the basis of identity. They showed significantly better than chance performance. It is not likely that participants relied on eye blinks for identity judgment, because the accuracy was significantly reduced when the animation reproduced the frequency and length of the blinks. The results suggest that people can distinguish between individuals from motion information of the eyes alone.

Anticipation of Action Intentions in Autism Spectrum Disorder

Matthew Hudson and Tjeerd Jellema

We investigated whether individuals with high-functioning autism spectrum disorder (ASD) are influenced by an agent's gaze direction when judging the agent's actions. Participants observed the agent's head rotate towards them, while the gaze direction was either leading, or lagging behind, head rotation. They also observed identical rotations of a cylinder containing the geometrical equivalent of the gaze manipulation. The typically-developed control group was influenced by the gaze manipulations in the animate stimulus: they overestimated the head rotation when gaze was ahead of rotation and underestimated it when gaze was lagging behind. For the inanimate stimulus there was no such effect. In contrast, the ASD group did not discriminate between the animate and inanimate stimuli; they showed a similar, weak, influence for both. This suggests that the ASD responses in the animate condition were determined by the low-level directional features of the eyes rather than by the conveyed intention to either continue to approach (gazed ahead), or to discontinue or slow down the approach (gaze lagging). We speculate that the reliance on low-level visual cues in the ASD group compensates for the inability to involuntarily extract the other's behavioural intentions as conveyed by gaze cues.

Evaluating observers' sensitivity to errors in virtual throwing animations

Michele Vicovaro, Ludovic Hoyet, Luigi Burigana and Carol O'Sullivan

In everyday life, videogames, and movies, motions of human characters and inanimate objects are often in close spatiotemporal relation (e.g., a throwing action of a baseball player). These events have not received much attention in human motion research. Using motion capture, we recorded an actor throwing a tennis ball with an over-arm or under-arm gesture, and transformed his actions into realistic virtual animations. The length of the parabolic trajectory was the same for both over-arm and under-arm throws, but the trajectory was slightly higher in under-arm throws. In this experiment, we manipulated the trajectory of the ball by increasing or decreasing its initial horizontal or vertical velocity at the time of release, and asked participants to judge if the animations were correct or incorrect. Our results show that observers are more sensitive to manipulations of the horizontal velocity for over-arm throws, but more sensitive to changes in vertical velocities for under-arm throws. In the latter case observers are more sensitive to decreases than to increases of velocity. This seems to show that the kind of gesture (over-arm vs. under-arm) modulates observers'sensitivity to the incongruence between the virtual character's performance and the trajectory of the ball.

Exploring Sensitivity to Time-Warped Biological Motion

Kenneth Ryall, Ludovic Hoyet, Jessica Hodgins and Carol O'Sullivan

In biological motion perception, it is often necessary to normalize certain factors, such as body shape or walking speed, in order to vary them methodically. For example, in order to create an average motion, or to compute and compare quantitative metrics, it is necessary that sets of motions be aligned in time. For computer animation, a character's timing is often manipulated to achieve a desired result, in a process called time-warping. For our experiment, we captured movements from four actors (2M, 2F) walking at slow, medium and fast speeds. Thirteen participants viewed two virtual characters walking side-by-side at medium speed, one time-warped (faster or slower), the other unmodified. They were either displayed on a point-light walker or on a full geometrical model, each normalized for body shape. Accuracy and response times were recorded. Participants were more sensitive to time-warping when the slow motions were made faster than when fast motions were made slower (87% vs. 63%, F(1, 11)=16.7, p<.005). They were also more accurate with the geometrical model than with the point-light walker (80% vs.71%, F(1, 11)=6.4, p<.05).

Spinal motor resonance by passive action viewing

Marc H.E. de Lussanet, Frank Behrendt and Heiko Wagner

How do we avoid to make an action that we see? This question arose with the discovery of mirror neurons. It is thought that reflex gains change to avoid movements [Rizzolatti and Craighero, 2004, Annu Rev Neurosci, 27,169-197]. However, the modulation of reflex gain to seeing an action has never been measured for a condition in which it is known how the gain changes during active execution of the action. We evoked medium-latency cutaneous reflexes from the anterior tibial (TA) leg muscle, of which the activity and reflexes during walking are well known. We found that the gain changes are as during active walking: the reflexes were increased at the end of the stance phase and decreased at the end of the swing phase. This confirms that the gain of postural reflexes is under direct and dynamic control by the brain. More importantly, it provides the first evidence that reflex gains during seeing an action agree with active execution, rather than with suppression of movement. It is further unclear how we avoid to automatically perform the actions that we see. Moreover, the meaningful modulation of reflex gains might have important social functions, and to better understand actions that we see.

Recognition from facial motion in dynamic average avatars

Fintan Nagle, Harry Griffin and Alan Johnston

Can we recognise a person by the way they change their expressions? Dynamic facial expression mimicry (in which one person copies the changing facial expressions of another) brings together the processes of perception, memory and motor control. Here we describe a method allowing near-photorealistic computational mimicry of a portrait video clip by an arbitrary target face. The output shows the target identity mimicking the source video clip. We also present a technique allowing blending of several target identities into a realistic 'average avatar' which can be driven by portrait video clips. The technique exploits the face space paradigm for dimensionality reduction and representation of faces and the multichannel gradient model [Johnston, 1992, Proc. Roy. Soc. Lond. B, 250, 297-306] for image registration. We show that near-photorealistic expression mimicry is possible and present experimental evidence that subjects can distinguish between known classmates and other individuals from the projection of the facial motion of these targets onto a dynamic average avatar.

Perceptual relevance of kinematic components of facial movements extracted by unsupervised learning

Martin A. Giese, Enrico Chiovetto and Cristobal Curio

The idea that complex facial or body movements are composed of simpler components (usually referred to as 'movement primitives'or 'action units') is common in motor control (Chiovetto 2011, Journal of Neurophysiology, 105(4), 1429-31.) as well as in the study of facial expressions (Ekman & Friesen, 1978). However, such components have rarely been extracted from real facial movement data. METHODS: Combining a novel algorithm for anechoic demixing derived from (Omlor & Giese 2011, J. of Machine Learning Research, 12, 1111-1148) with a motion retargetting system for 3D facial animation (Curio et al, 2010, MIT Press, 47-65), we estimated spatially and temporally localized components that capture the major part of the variance of dynamic facial expressions. The estimated components were used to generate stimuli for a psychophysical experiment assessing classification rates and emotional expressiveness ratings for stimuli containing combinations of the extracted components. RESULTS: We investigated how the information carried by the different extracted dynamic facial movement components is integrated in facial expression perception. In addition, we tried to apply different cue fusion models to account quantitatively for the obtained experimental results. Supported by DFG CU 149/1-2, GI 305/1-2, EC FP7-ICT grants TANGO 249858 and AMARSi 248311.

Influence of crowding on discriminating the direction of biological motion

Hanako Ikeda, Katsumi Watanabe and Patrick Cavanagh

It is difficult to identify a target in the peripheral visual field when it is flanked by distractors. In the present study, we investigated this 'crowding' effect for point-light biological motion perception. Three point-light walkers were presented horizontally in the periphery with various distances between them and observers reported the walking direction of the central figure. When the inter-walker distance was small, discriminating the direction became difficult. Moreover, the reported direction for the central target was not simply noisier, but reflected a pooling of the three directions. These results indicate that crowding occurs for biological motion perception. However, when the two flanking distractors were scrambled point-light walkers, crowding was not seen. This result suggests that the crowding in point-light biological motion perception occurs at a high-level of motion perception.

Physiologically plausible neural model for the recognition of dynamic faces

Girija Ravishankar, Gregor Schulz, Uwe J. Ilg and Martin A. Giese

Facial expressions are essentially dynamic. However, most existing research has focused on static pictures of faces. The computational neural functions that underlie the processing of dynamic faces are largely unknown. METHODS: We devised two alternative, physiologically plausible hierarchical neural models for the recognition of dynamic faces, which simulate the properties of neurons in face-selective regions such as the STS: 1) an example-based model with units, embedded in a recurrent network that is selective for the temporal order, encoding key frames of the expression sequence, and b) a norm-referenced model based on neurons that encode deviations from the neutral face ('norm stimulus') in feature space. Both models are based on an extension of a neural model for the recognition of static faces by [Giese & Leopold, 2005,Neurocomputing 65-66, 93-101]. They were tested on movies from rhesus monkeys showing 'threat'and 'coo-call' expressions. RESULTS: Both models work successfully and classify monkey expressions in real videos correctly. They make different predictions about the properties of face-selective single cells, e.g. those in the STS. CONCLUSIONS: Simple physiologically plausible neural circuits can account for the recognition of dynamic faces. Data from single-cell recordings will be required to decide between different models.

Auditory signal dominates visual in the perception of emotional social interactions.

Lukasz Piwek, Karin Petrini and Frank Pollick

Multimodal perception of emotions has been typically examined using displays of a solitary character (e.g. the face-voice and/or body-sound of one actor). We extend investigation to more complex, dyadic point-light displays combined with speech. A motion and voice capture system was used to record twenty actors interacting in couples with happy, angry and neutral emotional expressions. The obtained stimuli were validated in a pilot study and used in the present study to investigate multimodal perception of emotional social interactions. Participants were required to categorize happy and angry expressions displayed visually, auditorily, or using emotionally congruent and incongruent bimodal displays. In a series of cross-validation experiments we found that sound dominated the visual signal in the perception of emotional social interaction. Although participants'judgments were faster in the bimodal condition, the accuracy of judgments was similar for both bimodal and auditory-only conditions. When participants watched emotionally mismatched bimodal displays, they predominantly oriented their judgments towards the auditory rather than the visual signal. This auditory dominance persisted even when the reliability of auditory signal was decreased with noise, although visual information had some effect on judgments of emotions when it was combined with a noisy auditory signal. Our results suggest that when judging emotions from observed social interaction, we rely primarily on vocal cues from the conversation, rather then visual cues from their body movement.

Ne me quitte pas: An anxiety-induced bias in the perception of a bistable walker?

Sander Van de Cruys, Ben Schouten and Johan Wagemans

Bistable figures can serve as exquisite stimuli to study top-down (non-sensory) influences in perception, because bottom-up information can be kept constant while categorically different percepts are experienced. There is ample evidence for these influences in object formation. For example, Peterson [1994, Current Directions in Psychological Science, 3(4), 105-111] reports that we segment figure from background in face/vase-like stimuli depending on the meaningfulness of the segments. We tend to see meaningful segments as the foreground object. In this experiment we looked at whether the trait social anxiety of the viewer can bias perception of a bistable point-light walker, since its different percepts have different intrinsic emotional relevance (walking towards you or away from you). We found that people with high social anxiety have a bias towards seeing the point-light figure walking away from them, compared to people with low social anxiety. In a separate emotional dot probe experiment we confirmed that there was a difference in anxious behaviour between our groups. We discuss these results in the context of previous studies on emotion-induced differences in perception and explore possible alternative explanations.

Judging time-to-passage of biological motion in periphery.

Sandra Mouta, Jorge Santos and Joan Lopez-Moliner

In time-to-passage judgments the complexity of the motion pattern plays a more determinant role than the 'biologicity' of the stimulus. Articulated stimuli were judged as passing sooner than rigid stimuli but reflected more uncertainty in the judgments as revealed by precision judgments and required longer reaction times [Mouta et al, 2012, Journal of Vision, 12, 1-14]. It is known that periphery can perceive biological motion [Thompson et al, 2007, Journal of Vision, 7(10):12, 1-7]. In the everyday life it is required to interact or estimate motion variables of other agents located on periphery. In this study, stimuli were presented in a more peripheral location (32°). In a time-to-passage (TTP) task rigid (RM) and biological (BM) motion conditions were compared. Subjects had to decide whether the point-light walker passed the eye plane before or after a reference time (1s) signaled by a tone. Subjects could judge time to passage of point-light-walker displays on periphery, although they need longer reaction times and they are less precise. Even so differences on the accuracy between BM and RM were vanished. The anticipation of the passage for BM was no longer found. Reaction time was significantly higher for BM. Ack: Supported by FCT (SFRH/ BPD/ 63393/2009; PTDC/SAU-BEB/68455/2006); PSI2010-15867

An experimental-phenomenological study on perceived relations among mobile objects: the perceived social interactions among animate objects depends on the proximity of multiple objects in motion

Atsushi Sakai, Hidemi Komatsu and Naoe Masuda

Naïve observers can perceive some social interactions among the mobile toys that assume insect-like shape, namely small 'head' and thick 'body' with many 'legs', and that ambulate on some burred surface with oscillation made by built-in vibrator. According to expert observers'description, these toys that named 'HEX BUG nano' (TAKARA TOMY ARTS), naturally having neither brain nor voluntary intention, exhibit either integrative or segregative kinds of social or emotional interactions (Michotte, 1950) among them. To compare the free descriptions of expert observers with those of naïve subjects, some experiments were conducted as follows. (1) Motion pictures of the toys'behavior recorded by down shot were presented to naïve subjects to collect free descriptions. (2) Computer programmed motion picture, in which black circles (that spoiled the insect-like shape of the toy) randomly displaced their positions, were presented to the other naïve subjects. Under both conditions, the observers shared descriptions of social or interactive relations among animate objects. The observers perceived relations among motions of multiple objects that didn't assume any creature-like shape. The proximity of objects in motion was the necessary condition for perceiving such interactive relations.