Cherry himself was mainly interested in the ability of listeners to select target speech while ignoring other sounds in conditions where signals were either mixed or presented to separate ears. While the widespread use of this term might suggest the existence of a single, coherent field of research, scientific work has actually for many years proceeded along different lines that showed little or no overlap. The difficulties associated with understanding speech in multiple-talker situations often are associated with the term “cocktail-party problem” (or “cocktail-party effect”), coined by Colin Cherry in his 1953 paper. To most of us blessed with normal hearing, it seems as if this task is achieved without any effort, but the fragility of speech perception is clearly revealed when there is background noise or when a hearing impairment affects the peripheral encoding of the incoming signals. In particular in environments where we try to understand one talker among multiple persons speaking at the same time, the capacities of the auditory system are stretched to the limit. Speech communication is so all-pervasive and natural that it is easy to underestimate the formidable difficulties our auditory system has to overcome to be able to extract meaningful information from the complex auditory signals entering our ears. Despite recent progress, there are still many unresolved issues: there is a need for integrative models that are neurophysiologically plausible, for research into grouping based on other than spatial or voice-related cues, for studies explicitly addressing endogenous and exogenous attention, for an explanation of the remarkable sluggishness of attention focused on dynamically changing sounds, and for research elucidating the distinction between binaural speech perception and sound localization. This is consistent with the presence of a feedback loop in attentional control, triggering enhancement of to-be-selected input. Whereas it is now clear that such processing can take place preattentively, there also is evidence that the processing depth depends on the task-relevancy of the sound. More complex processing is required when lexical, syntactic, or semantic information is used. Sounds can be easily grouped-and subsequently selected-using primitive features such as spatial location and fundamental frequency. Grouping-the segregation and streaming of sounds-represents a subsequent processing stage that interacts closely with attention. Psychoacoustic models have been developed that can predict these effects accurately, albeit using computational approaches rather than approximations of neural processing. Important effects occurring at the peripheral and brainstem levels are mutual masking of sounds and “unmasking” resulting from binaural listening. How do we recognize what one person is saying when others are speaking at the same time? This review summarizes widespread research in psychoacoustics, auditory scene analysis, and attention, all dealing with early processing and selection of speech, which has been stimulated by this question.
0 Comments
Leave a Reply. |