A goal-directed action semantic involves comprehension of the object and its corresponding actions with respect to the context. In previous studies of action semantics, participants were mostly asked to determine the compatibility of the actions in a given situation [1,2,3,4,5,6,7,8,9,10,11,12]. Presentations of conditions violating action-semantics have often involved orientation or functional mismatch to properly execute the functions of the tools [1, 2], illogical tool substitution (e.g. cutting bread or playing cello with a saw instead of a bread knife or a bow [3,4,5,6,7]), or inappropriate body movements in a given context (e.g., a woman who’s looking at her watch, and carrying a suitcase while walking on a treadmill [4]). Concurrent event-related brain potentials (ERPs) have been analyzed to reveal the neural bases of the cognitive processes of action semantics [1,2,3,4,5,6,7,8,9,10,11,12]. Although context appears to be an indispensable factor processed with action as a meaningful unit, the gesture itself, which is a fundamental component of a valid action, has largely been ignored in the research field of action semantics. Thus, the present study aims to uncover the neural cognitive processes of tool–gesture incongruity.
ERPs for action semantics
Since gesture semantics is part of action semantics, a brief review of previous studies on action semantics is provided here. Among previous studies examining action semantics [1,2,3,4,5,6,7,8,9,10,11,12], the N400 is the most reported neural index revealing the congruency effect of action semantics. On the other hand, some researchers have also reported similar N300/N400 peaking earlier at 300 ms after stimulus onset, which has been assumed to reflect picture-related or action-specific semantic processing [5, 7, 8]. In terms of the N400 component, it was observed at first over the centro-parietal scalp areas in response to word stimuli derived from reading sentences in linguistic paradigms with semantically anomalous endings [13]. Later, similar N400 with more frontal distribution than the linguistic N400 have been reported as the action N400 from non-linguistic materials [1,2,3,4, 7, 8]. The N400 is therefore regarded as a neural index which can be elicited across stimuli if they are potentially meaningless and incomprehensible.
More consideration of the enhanced N400
Several underlying factors can influence the magnitude of the N400 in our consideration. The first is the extent of context violation. During sentence reading tasks, it has been reported that the magnitude of the N400 would be influenced by the cloze probability of a word [13, 14]. For instance, words that complete sentences in a nonsensical fashion (low cloze-probability; e.g., The bill was due at the end of the hour) elicit much larger N400 waves than those semantically appropriate words do (high cloze-probability; e.g., The bill was due at the end of the month) in a text [14]. Comprehension of linguistic and non-linguistic semantics is processed based on broad similarities, thus the neural activity patterns resulting from semantically anomalous information in a linguistic domain may show up along with non-linguistic domains such as action semantics. Second, the structural complexity of the background or peripheral context may also be a determining factor. The more abundant the structure is, the more the visual cues and/or that artifacts are provided, thereby influencing the effect of congruity. Third, whether the stimuli are presented in dynamic-, serial-, or static-, influences the topographical distribution and the magnitude of N300/N400 component [3, 7, 8, 10, 11]. For instance, Wu and Coulson [10] reported reduced N400 amplitude for serial cartoon segments, compared to static-image paradigms. Taken together, the N400 appears to be a high-context-dependent component. In this regard, the present study intends to minimize the peripheral factors such as illogical context violation or redundant background information, thereby manifesting the congruency effect of tool–gesture semantics.
Gesture semantics in previous ERP researches
Here we further review those studies using gestures as the main stimuli. Gestures, which are central to communication, have been found to trigger the N300 and N400 during the process of discriminating the semantics of hand postures [15]. Shibata et al. [16] used EPRs to evaluate the appropriateness of cooperative actions using pass-and-receive paradigms. Pictorial stimuli were presented in a series: first, a preshaped passing hand (e.g., an object placed at the hollow of the palm), then a receiving hand (e.g., palm down as the appropriate receiving action, palm up as inappropriate one), followed by a blank interval. It was found that an inappropriate receiving action elicited a more widely distributed cortical response than did an appropriate action, and the maximum N400 was located in the parietal region. The parietal N400, which is different from the fronto-central N400 reported for the context-violation paradigm, was thought to be semantics processing related to the prediction of interpersonal actions between two people.
Bach et al. [1] further investigated the appropriateness of tool-use actions by classifying mismatch conditions into “functional mismatch,” which involves instruments paired with normally inappropriate target objects (e.g., screwdriver to keyhole) and “orientation mismatch,” which relates to inconsistent spatial properties between the motor action and the target (e.g., orthogonal orientation between insertion and slot). The results of the varied latency of N400 indicated that action and object semantics derive from different sub-processes related to functional and orientation domains, respectively. In line with Bach et al. [1], Balconi and Caldiroli [2] reported a topographical difference in object-related action comprehension, where the significant N400 was observed in the fronto-central area for incorrect object use and predominantly in the temporo-parietal area for unusual object use.
More currently, Proverbio et al. [17, 18] proposed a left hemispheric asymmetry in the activation of premotor and somatosensory areas involved in object perception, which was associated with tool manipulability. They further used the ERPs to examine the neural responses to the visual presentation pictures depicting unimanual (e.g., a hammer) and bimanual (e.g., a handlebar) tools [19]. In the time window of 230–260 ms, the N2 amplitude was elicited at the left parietal cortex, followed by N400 (350–450 ms) at the right parietal cortex. Regardless of the time series, both components were found to be activated in the left premotor cortex. Specifically, only unimanual tools were related to the activation of the left postcentral gyrus in the second time window. This pattern of results suggests a role of the left hemisphere in the neural representation of grasping in right handed people, especially for the N2 component.
Though electrophysiological responses to appropriateness between action and tool have been assessed, the paradigms were quite divergent, hence, less consistent inferences could be concluded. Further, no straightforward evidence up to present has been proposed for understanding the compatibility of tools and manipulation of hand gestures. The present study would be the first to report brain activities involved in tool–gesture congruency, using the tool–gesture paradigm without the confounding factor of context violation and the effect of anticipation.
Late waveforms beyond N400
In addition to the N400 component, a late positive complex (LPC) after N400 has been observed in some recent studies [1, 5, 10, 12], while late negativity has been found in others [7, 8, 16]. Regardless of its polarities, researchers have assumed this late effect as a reevaluation of the available knowledge of goal-related requirements related to real-world actions [5] or decision-making-related processes [10]. The continued late effect suggests that N400 is not the final stage of the semantic process [14, 20]. Using EPRs enables us to investigate tool–gesture semantics in good time domain analysis, and whether later effect of tool–gesture compatibility occurs after 400 ms can therefore be determined.
In sum, the primary goal of this study is to investigate tool–gesture incongruity using an intra-gesture experimental design with the ERP technique. Based on previous literature, we preliminarily hypothesize that incorrect tool–gesture pairs elicit greater negative N400 amplitude than correct tool–gesture pairs do. Furthermore, a late waveform is expected because the task is relatively difficult and requires a greater degree of visual and cognitive deconstruction than those in previous studies. By means of the ERP recordings, the present study should reveal tool–gesture semantics processing with respect to tool manipulation.