Additional evidence that contour attributes are not essential cues for object recognition
© Greene; licensee BioMed Central Ltd. 2008
Received: 28 June 2007
Accepted: 01 July 2008
Published: 01 July 2008
It is believed that certain contour attributes, specifically orientation, curvature and linear extent, provide essential cues for object (shape) recognition. The present experiment examined this hypothesis by comparing stimulus conditions that differentially provided such cues. A spaced array of dots was used to mark the outside boundary of namable objects, and subsets were chosen that contained either contiguous strings of dots or randomly positioned dots. These subsets were briefly and successively displayed using an MTDC information persistence paradigm. Across the major range of temporal separation of the subsets, it was found that contiguity of boundary dots did not provide more effective shape recognition cues. This is at odds with the concept that encoding and recognition of shapes is predicated on the encoding of contour attributes such as orientation, curvature and linear extent.
"Stationary visual percepts, a tree, a stone, or a book, are as a rule extremely reticent as to the nature of the neural events which underlie their existence. We may hope to learn more about brain correlates if we turn to instances in which percept processes seem to be in a more active state." Kohler 
Cognitive, computational and neural theories of object recognition all share the concept that essential cues are provided by the orientation, curvature and linear extent of the lines and edges that lie at the boundary of an object and its component parts. We can describe these cues as "contour attributes," and describe the mechanisms for registering and encoding these attributes as "contour filters."
A previous study from this laboratory  offered evidence and arguments against the proposition that contour attributes provide the essential cues for object recognition. That study displayed dots that were positioned on the outer boundary of namable objects, varied the number of dots that were displayed in progressively larger samples, and manipulating the spatial positioning of dots within those samples. The display dots were chosen to provide subsets that either: a) formed contiguous strings that approximated line segments, b) were at randomly selected positions around the boundary of the shape, or c) were at evenly spaced positions around the boundary. For each condition the number of dots to be displayed (as a percentage of the total number of dots in the perimeter) was increased until the participant identified the object. The greatest percentage of dots was required for recognition when the subsets formed contiguous strings, and the smallest percentage was needed when the dots in the subsets were positioned with even spacing. The contiguous strings would have delivered the most information about contour attributes, yet this treatment condition provided the least effective cues for eliciting recognition of the shapes.
There were two additional reasons that the previous results  were at odds with the concept that contour attributes provide essential cues for recognition. First, it is widely believed that orientation-selective cells in primary visual cortex serve as the basic filters that register the contour attributes [3, 4]. Yet recognition of many shapes was possible with only a small sampling of dots, and with the space between adjacent dots being wider than the receptive fields of orientation-selective cells. Second, even if one proposed new principles for registering alignments among these dots, it is not obvious how one would know which dots to connect.
The present experiments used the minimal transient discrete cue (MTDC) protocol [5–7] to examine whether dot subsets that should activate contour filters (because of spatial contiguity) provide better shape cues than do noncontiguous subsets. This protocol uses brief and successive display of subsets that were chosen from the full inventory of boundary dots. Successful recognition of the shape required integration of the information provided by each subset, and the level of performance reflected the degree to which that information was useful. The prior research [6, 7] found that millisecond-level separation of subsets produced significant declines in recognition. This was the case even when the total display time for a given shape – and thus duration of cue persistence – was controlled. This means that the effectiveness of the shape cues, and their ability to be integrated, are reflected in the rate at which recognition declines when time differentials are inserted between the successive cues.
For the present experiment, the subsets provided either contiguous sequences of four dots, or four dots chosen at random locations in the boundary. Each contiguous dot subset would provide information about the local contour attributes of the shape. Each random dot subset would not provide this information, and to the extent that the subset might activate contour filters, would deliver inappropriate cues regarding alignments of the boundary. Therefore, to the extent that contour attributes are essential cues for shape recognition, performance levels should be higher for contiguous than for random subsets. The experimental results did not support this prediction.
The apparatus for display of shapes (display board) was the same as used in earlier experiments [2, 5, 6]. Briefly outlined here, it consisted of a 64 × 64 array of red LEDs. The participant sat at a distance of 3.5 m from the display board, and at this distance, the diameter of each element and the center-to-center spacings were 4.95 and 7.42 arc', respectively. The horizontal and vertical dimensions of the full array were 7.74 × 7.74 arc°.
The circuits of the display board provided for activation of a given LED by specifying an x, y address position within the array, under the control of a microprocessor running with a clock speed of 24 Mhz. Rise and fall time for emission was in the range of 100 ns. Room illumination was from standard ceiling-mounted fluorescent fixtures that were fitted with opaque panels to block most of the light. This provided ambient illumination of 13.3 lux. Luminance of an emitting LED was set at 7 Cd/m2.
As implemented here, the MTDC protocol was as follows. For a given shape, only some dots from the full inventory were shown, these being designated as the display set. The size of the display set was determined on the basis of testing done with other subjects. These tests established the number of evenly spaced dots needed to provide a 75% hit rate, i.e., successful recognition of a given shape by 75% of the participants, if all the dots were simultaneously displayed. This might be regarded as providing a given subject with a 75% probability of identifying the shape, and if only for convenience, it may be described in this manner in what follows.
To specify the display set for each shape (done independently for each participant), the first address to be included in the set was chosen at random. From there, counting in a clockwise direction, every Nth address was chosen, the value of N being that which would yield the 75% hit rate. An example of one possible display set is illustrated in the right panel of Fig. 1.
Each display set was further broken into randomly chosen subsets containing four dots each (with one residual subset potentially having fewer than four). Spatial positioning of dots within the subsets provided the experimental treatment designated as "proximity," and temporal separation of subsets provided the treatment designated as "temporal separation," also known as T3.
There were two levels of the proximity condition, requiring either that the dots of the subset be contiguous, or that they be randomly selected from among the members of the display set. Note that contiguity is relative, in that the display set consists of every Nth position within the full inventory of boundary dot positions. For the random condition, there was an additional restriction that each dot in the subset must lie at least three steps away from other positions within the display set.
Total display time for a given shape was determined by the number of dots in the display set multiplied by 0.1 ms, plus the T3 interval multiplied by the number of subsets minus one. For T3 = 1 ms, the minimum and maximum display times were 4.3 and 62 ms respectively, and the mean display time was 16.6 ms. For T3 = 3 ms, minimum and maximum display times (rounded up) were 10 and 150 ms, with a mean of 40 ms. For T3 = 9 ms, minimum and maximum display times were 28 and 414 ms, with a mean of 126 ms. For T3 = 27 ms, minimum and maximum display times were 82 and 1206 ms, with a mean of 320 ms.
The two levels of dot proximity and four levels of T3 provided eight treatment combinations. For each participant the inventory of 64 shapes were ranked for difficulty level, i.e., the number of dots required for a 75% hit rate, and then shapes were assigned at random from the ranked list to the eight treatment combinations. The net effect of the assignment was to provide each treatment level with a sampling of shapes that were approximately equal in difficulty. Each participant saw a given shape only once, and the order for display of the shapes (and thus the treatment combinations) was random.
Eight USC undergraduates served as participants, each displaying normal or corrected to normal visual acuity. Each was naïve to the goals of the experiment, and was paid for his or her participation.
Generalized Linear Mix Model Values for Treatment Conditions
This statistical analysis found no significant difference (p = 0.59) in recognition rate for the proximity condition, i.e., recognition of shapes was not different as a function of whether the subset dots were contiguous or were at randomly selected positions.
Although the difference between contiguous and random treatment conditions was not significant, inspection of the means plotted in Fig. 5 suggest the possibility that the treatments were not comparable at T3 = 27 ms. To formally evaluate this, pairwise comparisons of means were calculated, properly adjusting for the number of comparisons. There were no significant differences at the first three T3 intervals, but the difference at T3 = 27 ms was significant at p < .02. This differential could be a simple experimental artifact, in that a treatment will not always yield data that fits the overall trends.
A great many, perhaps a majority, of shape recognition theories propose that contour attributes, i.e., orientation, curvature and linear extent, provide the elemental features that define the shape of an object. Selfridge  may have been the first to characterize the perceptual process in terms of an assemblage of filters, each having the ability to register a distinctive contour attribute, but many others have followed this lead [see [10–14]].
The minimal transient discrete cue (MTDC) protocol [5–7] provides a means to evaluate the validity of this hypothesis. This method briefly displays a spaced array of dots that mark the outer boundary of the shape, the number of dots being just sufficient for recognition of the shape if all of them are shown with minimal delay. By choosing which dots to sample, and introducing delays between successive samples that are chosen, one can assess the effectiveness of the shape cues being provided by the samples.
The present goal was to examine whether contiguous subsets of dots would be more effective at eliciting recognition of shapes than would subsets having an equal number of dots that were randomly chosen from the full inventory of dots. The contiguous subsets should provide a more effective stimulus for the filters that are presumed to register the contour attributes. If shapes are specified on the basis of their contour attributes, then the contiguous subsets should convey the best partial shape cues, and one would expect these subsets to be more effective for eliciting recognition.
The overall result was that contiguous and randomly selected subsets contributed equally to shape recognition, even though the randomly selected subsets did not display cues that relate to the orientation, curvature and linear extent of the boundary. This indicates that under the present test conditions, contour attributes did provide cues that are essential for shape perception.
For the present task conditions, one might speculate that information persistence allowed successive dots to accumulate, such that dots from the random subsets could eventually form contiguous strings that provided contour attributes. There is persistence of brief visual stimuli, as reported by Sperling , Neisser , Haber and Standing , and Eriksen and Collins [18, 19], among others, and reviewed by Coltheart , Long , and Nisly and Wasserman . Whereas local contour information was not provided by a given random subset, one could argue that the contour-filtering process simply waited for a number of the subsets to be delivered, after which the contour attributes could be extracted from the aggregate pool of dots.
Recent work using the present experimental protocols, however, has found that millisecond and even submillisecond differentials in the display of dot subsets can produce significant differences in shape recognition [6, 7]. The result that is most critical to this discussion was provided by the second experiment in each of the cited studies, wherein the total time (and thus duration of persistence) for a given shape was held constant. Under these conditions, it was found that varying the interval between successive dots impaired recognition, with temporal separation of as little as half a millisecond being significant. Shape-relevant contour attributes are delivered directly by the contiguous dot subsets, but they could be provided by random subsets only through aggregation. The prior studies demonstrate that the cues do not aggregate without a recognition penalty.
When neural substrates for shape perception are discussed, most see the orientation-selective cells characterized by Hubel & Wiesel [3, 4] as providing the first step for registering contour attributes. A given cell can be activated by a contour, and because the firing rate is influenced by the orientation, length, and (possibly) curvature of the contour, the response is thought to convey information about these attributes. It is further suggested that an assemblage of these contour filters delivers the full complement of contour attributes needed for recognition.
However, previous results from this laboratory  raise the question of whether shape analysis depends on activation of orientation-selective cells. That study found that recognition was possible when the full complement of dots being shown was relatively sparse. Recognition was well above chance when dot spans exceeded the length of orientation-selective receptive fields . That outcome suggests that each dot is acting as an independent marker of boundary position, and that shape is defined by an unspecified – not yet known – relationship among the individual markers. Even when the orientation-selective cells are activated by an array of dots, the essential information might be the locations that have been specified rather than the collinearity in the array.
With respect to the present results, one might wonder whether the contiguous subsets were effective stimuli for the orientation-selective cells. Perhaps the cells did not respond to the very brief presentation of just four dots. There are three reasons to suggest that the subsets delivered adequate stimulation.
First, although the stimulus duration was very brief, the flashes were easily visible, i.e., consciously perceived. It is generally accepted that conscious awareness of a visual stimulus requires processing by the primary visual cortex, thus the stimulus strength was adequate for activating its neurons.
Second, the span of each contiguous subset was a suitable fit to the size of receptive fields. Sceniak et al.  examined receptive field size of orientation-selective cells in V1 of Macaque, and found the average space constant to be 60 arc', and the average length-summation tuning curve to be 49 arc'. The four-dot array of the contiguous subsets spanned 35 arc' for horizontal or vertical alignments, and 47 arc' for diagonal alignments. Therefore each of the contiguous subsets displayed an image size that would provide four dots to the receptive fields.
Third, there is direct electrophysiological evidence that an array of briefly flashed dots will stimulate the cortical cells. Jones & Palmer  examined responsiveness of orientation-selective cells with successive stimulation of local points across the receptive fields, the typical duration of each stimulus being 50 ms. They reported that the responses that could be elicited by stimulating one location at a time was too weak to be of practical value in the analysis of receptive field structure. However, simultaneous activation of three sites within the receptive field yielded usable data. As indicated above, the contiguous subsets of the present experiment displayed four dots that would register on a given receptive field, and this would provide a stronger stimulus than was found to be effective by Jones & Palmer .
The more general point is that the random subsets as well as the contiguous subsets were seen by the subject and delivered sufficient stimulation to elicit recognition. If one took the position that the contiguous arrays provided an insufficient stimulus for activating orientation-selective cells, it would mean that recognition was accomplished without any contribution from these cells.
It is possible, that the cues used for this experiment may be especially salient for activating a primitive shape encoding system. The pattern provided by the full complement of dots is very similar to a silhouette, and recognition is best when there is maximal simultaneity of the flashed dots. This is not unlike conditions that might face an early vertebrate – perhaps a fish – who detects simultaneous movement through small openings in a wall of seaweed. The pattern that is seen could be a predator, or might be prey, and successful recognition by the creature would have implications for survival. It is likely that these recognition skills evolved, and are present in a great many present-day animals that have no cortex.
Recent evidence from this laboratory , gathered and published after the present research was conducted, has demonstrated that the retina contains a neural system that is sensitive to millisecond-level simultaneity when the subsets consist of dot pairs. This suggests that the present task draws on primitive shape-encoding mechanisms that put a premium on very tight temporal proximity within a stimulus pattern. More advanced image-processing systems, such as primary visual cortex, might have similar requirements for simultaneity, but with a longer time constant. This could explain the differential at T3 = 27 ms as a contribution to the temporal-integration process by orientation-selective cells that could not be accomplished in the retina.
The finding that the contour attributes did not benefit recognition under the present test conditions should not be taken as a blanket rejection of a useful role in the perception of objects. The fact that we can detect edges with a contrast differential as small as 3% speaks to the benefit of these filters for registering the presence of a boundary. Doubtless this is useful for detecting an object that is almost the same color or luminance as the background, or where it must be seen through haze. Contour filters may make it possible to see the object's boundaries under a variety of degraded conditions, and there is ample evidence that alignment of lines and edges provides a basis for object completion. It is possible, however, that this processing allows the position of discrete markers to be specified. Shape perception, per se, may then be based on metric relationships that have little or nothing to do with collinearity of the markers.
The middle image has eliminated internal contours, texture, and color, replacing all these cues with uniform black. Yet this silhouette is readily identified as being in the shape of a rooster. The internal parts, color and texture must be at least somewhat ancillary, i.e., nonessential.
The right image has replaced the boundary edge with an array of dots, and we can still see the stimulus as having the shape of a rooster. Contour attributes of the boundary have been eliminated, but many will insist that they must be inferred in order to identify the shape.
Previous research demonstrated that as few as 19 dots allowed for recognition of the rooster by half of the subjects . It was hypothesized that the individual dots serve as markers of boundary positions, and the information needed for encoding and storage of shapes might be based on metric relationships among these markers. For the image provided by a natural object, a great many more markers are activated by its contours. But here also, some unspecified metric relationship among the markers may provide the basis for recognition, rather than collinearity of the markers.
Contours provide a number of cues that might contribute to identifying a given shape. Investigators and theorists have focused on a specific set of attributes that are provided by contours, in particular suggesting that orientation, curvature and linear extent serve to characterize and specify the shape. This emphasis has been augmented by evidence that neurons in visual cortex respond more vigorously at a particular orientation of the contour, with response strength being a function of length, and in some cases, curvature. The fact that these neurons also specify location of a contour segment is given minimal attention. It is conceivable that the locations that are registered by contour filters provide the information that is most essential for characterizing a given shape.
Minimal transient discrete cue
Minutes of visual angle
Candela per meter squared
Degrees of visual angle
Light emitting diode
Lumen per meter squared
Number used to specify number of dots from address list to be displayed
Temporal separation between members of subset pairs
Temporal separation between subset pairs
Primary visual cortex
Standard error of the mean.
I wish to thank David Gorin for writing the custom applications used in this research, and Dr. Leigh Callinan for statistical analysis of data. Ambient and LED luminance values were measured by Drs. Ronald Henry and Andrew Jones. This research was supported, in part, by the Neuropsychology Foundation.
- Kohler W: Dynamics in Psychology. 1940, New York: Liveright Publishing, 67-68.Google Scholar
- Greene E: Recognition of objects that are displayed with incomplete sets of discrete boundary dots. Percept Motor Skills. 2007, 104: 1043-1059.PubMedGoogle Scholar
- Hubel DH, Wiesel TN: Receptive fields of single neurons in the cat's striate cortex. J Physiol. 1959, 148: 574-591.PubMed CentralView ArticlePubMedGoogle Scholar
- Hubel DH, Wiesel TN: Receptive fields, binocular interactions and functional architecture in the cat's visual cortex. J Physiol. 1962, 160: 106-154.PubMed CentralView ArticlePubMedGoogle Scholar
- Greene E: Simultaneity in the millisecond range as a requirement for effective shape recognition. Behav Brain Funct. 2006, 2: 38-10.1186/1744-9081-2-38.PubMed CentralView ArticlePubMedGoogle Scholar
- Greene E: Spatial and temporal proximity as factors in shape recognition. Behav Brain Funct. 2007, 3: 27-10.1186/1744-9081-3-27.PubMed CentralView ArticlePubMedGoogle Scholar
- Greene E: Information persistence in the integration of partial cues for object recognition. Percept Psychophys. 2007, 69: 772-784.View ArticlePubMedGoogle Scholar
- Schall R: Estimation in generalized linear models with random effects. Biometrika. 1991, 40: 917-927.Google Scholar
- Selfridge OG: Pattern recognition and learning. Information theory. Edited by: Cherry C. 1957, New York: Academic Press, 345-353.Google Scholar
- Sutherland NS: Outlines of a theory of visual pattern recognition in animals and man. Proc R Soc Lond B Biol Sci. 1968, 171 (24): 95-103.Google Scholar
- Hinton GE: A parallel computation that assigns canonical object-based frames of reference. Proceedings of the Seventh International Joint Conference on Artificial Intelligence. 1981, Los Altos, CA: International Joint Conferences on Artificial Intelligence, 683-685.Google Scholar
- Marr D: Vision: A Computational Investigation into the Human Representation and Processing of Information. 1982, New York: WH Freeman, 51-79.Google Scholar
- Quinlan PT: Differing approaches to two-dimensional shape recognition. Psychol Bull. 1991, 109: 224-241. 10.1037/0033-2909.109.2.224.View ArticlePubMedGoogle Scholar
- Palmer SE: Vision science: photons to phenomenology. 1999, Cambridge, MA: MIT PressGoogle Scholar
- Sperling G: The information available in brief visual presentations. Psychol Monogr. 1960, 74: 1-29.View ArticleGoogle Scholar
- Neisser U: Cognitive Psychology. 1967, New York: Appleton-Century-CroftsGoogle Scholar
- Haber RN, Standing L: Direct measures of short-term visual storage. Quart J Exp Psychol. 1969, 21: 43-54. 10.1080/14640746908400193.View ArticleGoogle Scholar
- Eriksen CW, Collins JF: Some temporal characteristics of visual pattern perception. J Exp Psychol. 1967, 74: 476-484. 10.1037/h0024765.View ArticlePubMedGoogle Scholar
- Eriksen CW, Collins JF: Sensory traces versus the psychological moment in the temporal organization of form. J Exp Psychol. 1968, 77: 376-380. 10.1037/h0025931.View ArticlePubMedGoogle Scholar
- Coltheart M: Iconic memory and visible persistence. Percept Psychophys. 1980, 27: 183-228.View ArticlePubMedGoogle Scholar
- Long GM: Iconic memory: A review and critique of the study of short-term visual storage. Psychol Bull. 1980, 88: 785-820. 10.1037/0033-2909.88.3.785.View ArticlePubMedGoogle Scholar
- Nisly SJ, Wasserman GS: Intensity dependence of perceived duration: data, theories, and neural integration. Psychol Bull. 1989, 106: 483-496. 10.1037/0033-2909.106.3.483.View ArticlePubMedGoogle Scholar
- Sceniak MP, Hawken MJ, Shapley R: Visual spatial characterization of macaque V1 neurons. J Neurophys. 2001, 85 (5): 1873-1887.Google Scholar
- Jones JP, Palmer LA: The two-dimensional spatial structure of simple receptive fields in cat striate cortex. J Neurophys. 1987, 58: 1187-1211.Google Scholar
- Greene E: Retinal encoding of ultrabrief shape recognition cues. PLoS ONE. 2007, 2 (9): e871-10.1371/journal.pone.0000871.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.