A Multimedia Application: Spatial Perceptual Entropy of Multichannel Audio Signals
© Shuixian Chen et al. 2010
Received: 17 November 2009
Accepted: 11 February 2010
Published: 29 March 2010
Usually multimedia data have to be compressed before transmitting, and higher compression rate, or equivalently lower bitrate, relieves the load of communication channels but impacts negatively the quality. We investigate the bitrate lower bound for perceptually lossless compression of a major type of multimedia—multichannel audio signals. This bound equals to the perceptible information rate of the signals. Traditionally, Perceptual Entropy (PE), based primarily on monaural hearing measures the perceptual information rate of individual channels. But PE cannot measure the spatial information captured by binaural hearing, thus is not suitable for estimating Spatial Audio Coding (SAC) bitrate bound. To measure this spatial information, we build a Binaural Cue Physiological Perception Model (BCPPM) on the ground of binaural hearing, which represents spatial information in the physical and physiological layers. This model enables computing Spatial Perceptual Entropy (SPE), the lower bitrate bound for SAC. For real-world stereo audio signals of various types, our experiments indicate that SPE reliably estimates their spatial information rate. Therefore, "SPE plus PE" gives lower bitrate bounds for communicating multichannel audio signals with transparent quality.
A central goal in multimedia communications is to deliver quality contents with the lowest possible bitrate. By quality, we mean the perceived fidelity of the received contents against the original contents. And the lowest possible bitrate depends on two disparate concepts: entropy and perception. Entropy measures the quantity of information . But not all information is perceptible.
To pursue this goal, we want to know how many bits are sufficient to convey quality multimedia contents. Lossless compression always ensures the highest possible quality, in which the objective redundancy in the multimedia contents is the only source of compression, and there is a limit, the Shannon entropy, the lowest possible bitrate with perfect decompression. Nevertheless, this limit is very hard if not impossible to compute due to the diversity and complexity of probability models of multimedia contents. By Huffman coding, run-length coding, arithmetic coding, and other entropy coding techniques, the state-of-the-art lossless audio coders today typically achieve a compression rate of 1/3-2/3 or 230–460 kbps per channel for CD music .
Lossless compression generally conveys higher than necessary quality in multimedia communications. Multimedia contents abound subjective irrelevancy—objective information we cannot sense. Perceptually lossless compression suffices. For audio signals, this means lossless to the extent that the distortion after decompression is imperceptible to normal human ears (usually called transparent coding), the bitrate can be much lower than the true lossless coding. Perceptual audio coding  by removing the irrelevancy greatly reduces communication bandwidth or storage space. Psychoacoustics provides a quantitative theory on this irrelevancy [4–7]: the limits of auditory perception, such as the audible frequency range (20–20000 Hz), the Absolute Threshold of Hearing (ATH), and the masking effect . In state-of-the-art perceptual audio coders, such as MPEG-2/4 Advanced Audio Coding (AAC [9, 10]), 64 kbps is enough for transparent coding . The Shannon entropy cannot measure the perceptible information or give the bitrate bound in this case.
We can see that if in (1) assumes conservative values (smaller), PE will be larger. On the other hand, Adaptive Multirate (AMR ) and Adaptive Multirate Wide Band (AMR-WB ) use a priori knowledge of human voicing, also reducing bitrate. Apart from these two points, PE reliably predicts the lowest bitrate required for transparent audio coding. Since formulated, PE has found widespread use in audio coding and has become a fundamental theory in this field. Main stream perceptual audio coders, such as MP3  and AAC, all employ PE as an important psychoacoustic parameter, leading to various practical methods not just theory.
Nevertheless, PE has significant limitation to measure perceptual information. This limitation primarily comes from the underlying monaural hearing model. Human has two ears to receive sound waves in a 3-dimensional space: not only is the time and frequency information perceived—needing just individual ears—but also spatial information or localization information—needing both ears for spatial sampling. Due to the unawareness of binaural hearing, PE of multichannel audio signals is simplified to the supposition of PE of individual channels, which is significantly larger than real quantity of information received because multichannel audio signals usually correlate. The purpose of this paper is to measure the perceptual information of binaural hearing.
We first analyze the localization principle of binaural hearing and give a spatial hearing model on the physical and physiological layers. Then we propose a Binaural Cue Physiological Perception Model (BCPPM) based on binaural hearing. Finally using binaural frequency-domain perception property, we give a formula to compute the quantity of spatial information and numerical results of spatial information estimation of real-world stereo audio signals.
With the left and right ears, human being is able to detect spatial information: sound source localization and sound source spaciousness. The former comprises of the range, azimuth, and elevation, in other words, the 3-dimensional spherical coordinate. The later can be measured by angle span of auditory images.
On the physical layer, sound waves propagate from sources along different paths to the ears and then in the ear canals and finally to the cochlea, absorbed and reflected by walls, floors, torso, head, and other objects on the way. Those sound waves carry objective localization information. On the physiological layer, sound waves are transformed to neural cell excitation and inhibition by the auditory system. There are different types of auditory neural cell responding to different types of sound stimulus, such as intensity, frequency, and delay. Thus physical quantities become physiological data.
In audio compression, irrelevancy removing is mainly on the physical and physiological layers. In the following, we discuss the representation of binaural cues on the two layers—BCPPM.
1.1. Spatial Information on the Physical Layer
2. Physiological Perception Modeling of Binaural Hearing
Although a real head is far from being the rigid ball, the above results are basically correct. In 2002, Macpherson and Middlebrooks demonstrated that the duplex theory is suitable for a variety of audio signals: pure tones, wide band signals, high pass signals, as well as low pass signals . Exception is high frequency signals with envelope delay .
Unlike ILD and ITD, the spectral cue needs prior knowledge to provide elevation information. In principle, sounds may have arbitrary spectra. A listener is not able to detect the elevation angle based solely on the spectra: any characteristics may come from sound sources themselves and may come from the filtering effect of pinnae. The listener cannot tell.
Blauert reported a very interesting auditory phenomenon of narrow-band sound sources on the medial plane: the elevation angles given by subjects are independent of the real elevation angles but depended on the signal frequencies . For wide-band signals of familiar types, it is easy for our auditory system to compare the pinnae filtered spectra (some frequency amplified and some decayed) to the spectra in memory, and based on the difference, reliable elevation angle estimation can be given (Figure 3). But for narrow-band signals, pinnae filtered spectra do not have detectable shape difference, just level difference. Thus the elevation angle detection will be very unreliable. In fact, the elevation angles given by the subjects are the angles at which the narrow-band signals have the maximum gain due to the pinnae filtering. For example, the peak gain frequency when the sounds come from the front is 3 kHz for most people . So wherever a sound of 3 kHz came from, most subjects pointed at the front.
From the perspective of signal processing, sound wave propagation is roughly a Linear Time Invariant (LTI) system. To describe this LTI system in binaural hearing, we have Head-Related Transfer Function (HRTF [27–29]) or equivalently Head-Related Impulse Response (HRIR). In open space, HRTF/HRIR is the function of source location, that is, range, azimuth, and elevation.
Obviously, ILD and ITD are not only source location dependent, but also frequency dependent.
To obtain accurate relationship between sound source locations and sound wave propagation, more realistic head models or real heads are needed. In 1994, the MIT Media Lab collected HRTFs on 710 locations in the 3-dimensional space using the KEMAR head . In 2001, CIPIC of U.C. Davis examined HRTFs of 45 subjects and 2 KEMAR heads . Individual difference of HRTFs is revealed in HRTFs obtained by the experiments. Nevertheless, there are common characteristics that are sufficient to derive subject-independent spatial information.
2.1. Spatial Information on the Physiological Layer
In human auditory system, ITD and ILD of external sound sources stimulate or inhabit specific neural cells in the full audible frequency range. This process comprises of two steps: Frequency-to-Place Transform (FPT) [32, 33] and Binaural Processing (BP).
In 1960, Bèkèsy reported that sounds of different frequencies generate surface waves on the basilar membrane in cochlea with peak amplitudes at different places, which are determined by the frequencies . In other words, a specific frequency is mapped to a specific place on the basilar membrane, or FPT, and this specific frequency for a given place is called Characteristic Frequency (CF ). Hair cells on that place then transform the mechanical swing into electric signals of auditory nerves.
The neural signals from the left and right ears corresponding to the same frequency meet in the brain. Our auditory system then extracts the ITD and ILD information in the signals. Currently, there are two kinds of theories on this process: Excitation-Excitation (EE ) and Excitation-Inhibition (EI ). The former proposed that there are auditory nerve cells of EE-type located between the inferior colliculus and the medial superior olive, and specific EE-type cells there have maximum excitation for signals with specific ITD and ILD; the latter proposed that there are auditory nerve cells of EI-type located between the inferior colliculus and the lateral superior olive, and specific EI-type cells there have maximum inhibition for signals with specific ITD and ILD. The common ground of the two theories is that specific nerve cells are only sensitive to specific ITD and ILD, which are called characteristic ITD and characteristic ILD. In some literatures, characteristic ITD is also called Best Delay (BD ) or Characteristic Delay (CD ). Both the EE-type and EI-type have supports from physiological research, but the latter explains better the various binaural hearing phenomena .
In the Breebaart model, only if the internal delay and attenuation are exactly compensated by the external ITD and ILD, the corresponding EI-type elements will have the largest inhibition. Thus, knowing the position of the EI-type element with the largest inhibition, the auditory system finds the ITD and ILD of the external audio signals.
The Breebaart model also implies the calculation of Interaural Coherence (IC), which manifests as the trough of the excitation surface, in accordance with the EI-type assumption. Nevertheless, there is no direct physiological quantity related to IC in this model.
In 2004, Faller and Merimma reported that IC relates to perceiving sound image width and stability, as well as sound field ambience [46, 47]. On the other hand, by the precedence effect [48, 49] of spatial hearing—sound source localization depending primarily on the direct sounds to the ears and essentially irrespective to reflection and reverberation—which contributes to lowering IC, Faller proposed that our auditory system use ITD and ILD to localize sound sources only if IC approaches 1. Since direct sounds to the ears have near 1 cross-correlation, this explains the precedence effect.
2.2. Binaural Cue Physiological Perception Model (BCPPM)
Since the wavelength (0.012–17 m) of sound in the audible range (20–20000 Hz) is much longer than light, and comparable to normal objects in our surrounding—leading to significant interference and diffraction—spatial information from hearing is limited initially. This limited information is first compromised by noises and other interferences from other sound sources, as indicated by in Figure 7. Then during transformation from mechanical swing to electric impulses, part of the information is lost again due to the limited frequency range and dynamic range, the limited frequency and temporal resolution, and physiological noises of our auditory system, as is indicated by in Figure 7.
In Section 1.1, we see that the physical data of sound source localization in binaural hearing are in form of ITD and ILD. In Section 2.1, we see that ITD and ILD are transformed to maximum inhibition of specific EI-type auditory nerve cells in the Breebaart model, and the physiological data are in the form of coordinates of the delay-attenuation network.
When there are multiple sound sources, background noises, reflection, diffraction, and reverberation, IC becomes another type of physical data conveying the overall sound field information.
Since spatial hearing on the physiological layer is too complex and uncertainty to be incorporated in computational model for common listeners, we restrict the calculation of perceptible spatial information to that directly related to ITD, ILD, and IC and physiological data corresponding to the three cues. In fact, spatial coding systems use the cues to represent spatial information.
We first review the psychoacoustic foundation of PE, mainly the nonlinear frequency resolution (Critical Band, CB [50, 51]) of our hearing system, spreading functions in the frequency domain for noises and tones and tonality estimation.
The BCPPM consists of 3 modules.
Frequency-to-Place Transform in Cochlea.
This process separates sounds into a bank of subband signals, essentially the subband filtering in MHM. The subband filter can be implemented by DFT with spectral lines grouped to subbands according to CB or by the Cochlear Filter Bank (CFB ) proposed by Baumgarte in 2002.
Effective Channel Noises.
The effective channel noise for ITD, ILD, and IC ( , , and in Figure 10) is a simplified method to model the limited precision, intrinsic noises, and intersource interference in our hearing system. Part of the noise comes directly from grains of delay and attenuation ( and in Figure 6). For example, if , . Generally, and are functions of frequency. A related concept is Just Noticeable Difference (JND) in psychoacoustics, indicating the overall sensitivity of our auditory system. On the other hand, ITD, ILD, and IC are not independent, there are interactions among them. The effective channel noise should also incorporate the interactions.
3. Computing Spatial Perceptual Entropy (SPE) Based on BCPPM
3.1. SPE Definition
From the information theory viewpoint, we see BCPPM as a double-in-multiple-out system (Figure 10). The double-in is the left ear entrance sound and the right ear entrance sound. The multiple-out consists of 75 effective ITDs, ILDs, and ICs (25 CBs, each with a tuple of ITD, ILD, and IC).
Like in computing PE, we view each path that leads to an output as a lossy subchannel. Then there are 75 such subchannels. Unlike PE, what a subchannel conveys is not a subband spectrum but one of ITD, ILD, and IC of the subband corresponding to the sub-channel.
In each sub-channel, there are intrinsic channel noises (resolution of spatial hearing), and among sub-channels, there are interchannel interferences (interaction of binaural cues). Then there is an effective noise for each sub-channel.
For some probability distributions, say uniform distribution, (5), (6), and (7) can be readily calculated.
3.2. CB Filterbank
Critical Bands for 2048-point DFT, sampling frequency 48 kHz .
Frequency Range (Hz)
Frequency Range (Hz)
3.3. Binaural Cues Computation
where is the indexes of CB, and the starting DFT spectral index of and (Table 2), and the th spectral lines from left and right ear entrance signals.
3.4. Effective Spatial Perception Data
The resolutions or quantization steps of the binaural cues (Figure 12) can be determined by JND experiments. Denote by , , and the resolutions of ITD, ILD, and IC, respectively. Generally, they are signal dependent and frequency dependent. For simplicity, we use constant values [44, 54]: ms, dB, and .
Larger IC usually implies higher ITD perception precision or equivalently morespatial information. When IC approaches 1, the activity surface will have a very sharp decreasing toward the point with the lowest auditory nerve activity. In this case, the uncertainty of ITD is very small and is determined precisely. When IC decreases to 0, the surface becomes flatter, leading to larger uncertainty or lower precision of ITD. In the extreme case, when , the gradient along the IC axis will be constantly 0, there is no well defined trough point and ITD is completely indeterminable.
From (13) we see that when IC(b)=1, assumes the minimum and the auditory system has the highest resolution for ITD; when , , the resolution of ITD is lower but there is still spatial information from ITD; when , , the resolution of ITD is 0 and there is no spatial information in ITD.
where N is the number of spectral lines in one transform, or 1024 in this case; , , and can be found from (9), (10), and (11), respectively; , , and are the JNDs of ILD, ITD, and IC on CB b, respectively, obtained from subjective listening experiments; and is the amplitude compression factor, assuming 0.6 .
We evaluate SPE of 126 stereo sequences from 3GPP and MPEG, which are classified into speech, single instrument, simple mixture, and complex mixture, all sampled at 44.1 kHz. For comparison, we also evaluate PE of these sequences.
In the following experiments, , , and assume constant and conservative values, and their frequency dependency is also ignored. The overall SPE is the sum of entropy of effective IC, ILD, and ITD perception data, shown in (4).
4.1. Perceptual Spatial Information of Stereo Sequences
From Figure 15 we find that speech sequences generally have the lowest spatial information rate, mean 2.75 kbps, this is in accordance with the recording practice that voices usually stay in direct front of the sound field; single instrument sequences and simple mixture sequences have similar spatial information rate, mean 3.49 kbps and 3.66 kbps, respectively; complex mixture sequences generally have the highest spatial information rate, mean 6.90 kbps, this can be explained by multiple sound sources at diverse sound field locations in this type of sequences.
In Parametric Stereo (PS ) coding, it is reported that 7.7 kbps of spatial parameter bitrate is sufficient for transparent spatial audio quality, agreeing very well with our SPE computation.
4.2. Temporal Variation of Spatial Information Rate in a Single Senescence
The test data show that for es02 with stable voice from the front, SPE stays at 1-2 kbps; for sc03 with multiple instruments and strong spatial impression, SPE stays at about 7 kbps. But within either sequence, the SPE changes little.
4.3. Overall Perceptual Information in Stereo Sequences
Using PE to evaluate the perceptual information, only intrachannel redundancy and irrelevancy are exploited; the overall PE is simply the sum of PE of the left and right channels. Using SPE based on BCPPM, interchannel redundancy and irrelevancy are also exploited; the overall perceptual information is about one normal audio channel plus some spatial parameters, which has significantly lower bitrate.
We have developed the Binaural Cues Physiological Perceptual Model (BCPPM) to measure the perceptible information, or Spatial Perceptual Entropy (SPE), in multichannel audio signals and have given a lower bitrate bound in multimedia communications for this type of contents. BCPPM models the physical and physiological processing of human spatial hearing into a parallel of lossy communication subchannels with inter-subchannel interference, and SPE is the overall channel capacity. Each of these subchannels carries ITD, ILD, or IC with addictive noises, resulted from intrinsic noises of binaural cues perception and interferences among the cues within the same CB. Experiments on stereo signals of different types have confirmed that SPE is compatible with the spatial parameter bitrate and spatial impression in SAC.
Nevertheless, SPE gives only the lower bitrate bound for transparent quality. We will extend SPE to give the bound for given subjective quality in the future. Then in mobile, internet, and other communications networks conveying multichannel audio signals, we can use the estimated bound to allocate bandwidth for a particular Quality of Service (QoS), transparent or degraded and thus save bandwidth or improve the overall QoS. On the other hand, current SAC may benefit from SPE—dynamically allocating bitrate to accommodate varying spatial contents—thus improving quality and reducing overall bitrate.
This research is supported by the National Science Foundation of China Grant no. 60832002.
- Shannon CE: A mathematical theory of communication. Bell System Technical Journal 1948, 27: 379-423, 623–656.MathSciNetView ArticleMATHGoogle Scholar
- Lossless comparison http://wiki.hydrogenaudio.org/index.php?title=Lossless_comparison
- Painter T, Spanias A: Perceptual coding of digital audio. Proceedings of the IEEE 2000, 88(4):451-513. 10.1109/5.842996View ArticleGoogle Scholar
- Zwicker E, Fastl H: Psychoacoustics Facts and Models. Berlin, Germany, Springer; 1990.Google Scholar
- Moore BCJ: An Introduction to the Psychology of Hearing. 5th edition. Elsevier Academic Press, London, UK; 2003.Google Scholar
- Zwicker E, Zwicker UT: Audio engineering and psychoacoustics. Matching signals to the final receiver, the human auditory system. Journal of the Audio Engineering Society 1991, 39(3):115-126.MathSciNetGoogle Scholar
- Hall JL: Auditory psychophysics for coding applications. In The Digital Signal Processing Handbook. Edited by: Madisetti V, Williams D. CRC Press, Boca Raton, Fla, USA; 1998:39.1-39.25.Google Scholar
- Moore BCJ: Masking in the human auditory system. In Collected Papers on Digital Audio Bit-Rate Reduction. Edited by: Gilchrist N, Grewin C. Audio Engineering Society, New York, NY, USA; 1996:9-19.Google Scholar
- ISO/IEC JTC1/SC29/WG11 : Information Technology—Generic Coding of Moving Pictures and Associated Audio Information—Part 7: Advanced Audio Coding (AAC). ISO/IEC 13818-7, 2005Google Scholar
- ISO/IEC JTC1/SC29/WG11 : Information Technology—Generic Coding of Moving Pictures and Associated Audio Information—Part 3: Audio, Subpart 4: General Audio Coding. ISO/IEC 14496-3, 2005Google Scholar
- Bosi M, Goldberg RE: Introduction to Digital Audio Coding and Standards. Kluwer Academic Publishers, Boston, Mass, USA; 2003.View ArticleGoogle Scholar
- Johnston JD: Transform coding of audio signals using perceptual noise criteria. IEEE Journal on Selected Areas in Communications 1988, 6(2):314-323. 10.1109/49.608View ArticleGoogle Scholar
- Johnston JD: Estimation of perceptual entropy using noise masking criteria. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '88), May 1988 2524-2527.Google Scholar
- 3GPP : Mandatory speech CODEC speech processing functions; AMR speech Codec; General description. 3GPP TS 26.071, 2008, http://www.3gpp.org/ftp/Specs/html-info/26071.htm
- 3GPP : Speech codec speech processing functions; Adaptive Multi-Rate—Wideband (AMR-WB) speech codec; General description. 3GPP TS 26.171, 2008, http://www.3gpp.org/ftp/Specs/html-info/26171.htm
- ISO/IEC , JTC1/SC29/WG11 MPEG : Information technology—coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s—part 3: audio. ISO/IEC 11172-3, 1992Google Scholar
- Blauert J: Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press, Cambridge, Mass, USA; 1997.Google Scholar
- Hofman PM, Van Riswick JGA, Van Opstal AJ: Relearning sound localization with new ears. Nature Neuroscience 1998, 1(5):417-421. 10.1038/1633View ArticleGoogle Scholar
- Strutt JW: On our perception of sound direction. Philosophical Magazine 1907, 13: 214-232.View ArticleGoogle Scholar
- Macpherson EA, Middlebrooks JC: Listener weighting of cues for lateral angle: the duplex theory of sound localization revisited. Journal of the Acoustical Society of America 2002, 111(5):2219-2236. 10.1121/1.1471898View ArticleGoogle Scholar
- Blauert J: Sound localization in the median plane. Acustica 1969-1970, 22(4):205-213.Google Scholar
- Hebrank J, Wright D: Spectral cues used in the localization of sound sources on the median plane. Journal of the Acoustical Society of America 1974, 56(6):1829-1834. 10.1121/1.1903520View ArticleGoogle Scholar
- Butler RA, Belendiuk K: Spectral cues utilized in the localization of sound in the median sagittal plane. Journal of the Acoustical Society of America 1977, 61(5):1264-1269. 10.1121/1.381427View ArticleGoogle Scholar
- Rakerd B, Hartmann WM, McCaskey TL: Identification and localization of sound sources in the median sagittal plane. Journal of the Acoustical Society of America 1999, 106(5):2812-2820. 10.1121/1.428129View ArticleGoogle Scholar
- Musicant AD, Butler RA: The influence of pinnae-based spectral cues on sound localization. Journal of the Acoustical Society of America 1984, 75(4):1195-1200. 10.1121/1.390770View ArticleGoogle Scholar
- Asano F, Suzuki Y, Sone T: Role of spectral cues in median plane localization. Journal of the Acoustical Society of America 1990, 88(1):159-168. 10.1121/1.399963View ArticleGoogle Scholar
- Møller H, Sørensen MF, Hammershøi D, Jensen CB: Head-related transfer functions of human subjects. Journal of the Audio Engineering Society 1995, 43(5):300-321.Google Scholar
- Møller H: Fundamentals of binaural technology. Applied Acoustics 1992, 36(3-4):171-218. 10.1016/0003-682X(92)90046-UView ArticleGoogle Scholar
- Huang Y, Enesty J (Eds): Spatial hearing In Audio Signal Processing for Next-Generation Multimedia Communication Systems. Kluwer Academic Publishers, Norwell, Mass, USA; 2004:345-370.Google Scholar
- Gardner WG, Martin KD: HRTF measurements of a KEMAR. Journal of the Acoustical Society of America 1995, 97(6):3907-3908. 10.1121/1.412407View ArticleGoogle Scholar
- Algazi VR, Duda RO, Thompson DM, Avendano C: The CIPIC HRTF database. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics, October 2001, New Paltz, NY, USA 99-102.Google Scholar
- Greenwood DD: A cochlear frequency-position function for several species: 29 years later. Journal of the Acoustical Society of America 1990, 87(6):2592-2605. 10.1121/1.399052View ArticleGoogle Scholar
- Greenwood DD: Critical bandwidth and the frequency coordinates of the basilar membrane. Journal of Acoustic Society America 1961, 33(10):1344-1356. 10.1121/1.1908437View ArticleGoogle Scholar
- von Bèkèsy G: Experiments in Hearing. McGraw Hill, New York, NY, USA; 1960.Google Scholar
- Møller AR: Hearing: Anatomy, Physiology, and Disorders of the Auditory System. 2nd edition. Academic Press, Burlington, Vt, USA; 2006.Google Scholar
- Rose JE, Gross NB, Geisler CD, Hind JE: Some neural mechanisms in the inferior colliculus of the cat which may be relevant to localization of a sound source. Journal of Neurophysiology 1966, 29(2):288-314.Google Scholar
- Park TJ: IID sensitivity differs between two principal centers in the interaural intensity difference pathway: the LSO and the IC. Journal of Neurophysiology 1998, 79(5):2416-2431.Google Scholar
- Joris PX, Van de Sande B, Louage DH, van der Heijden M: Binaural and cochlear disparities. Proceedings of the National Academy of Sciences of the United States of America 2006, 103(34):12917-12922. 10.1073/pnas.0601396103View ArticleGoogle Scholar
- Stern RM, Wang DeL, Brown G: Binaural sound localization. In Computational Auditory Scene Analysis. Edited by: Brown G, Wang DeL. Wiley/IEEE Press, New York, NY, USA; 2006.Google Scholar
- Breebaart J, van de Par S, Kohlrausch A: The contribution of static and dynamically varying ITDs and IIDs to binaural detection. Journal of the Acoustical Society of America 1999, 106(2):979-992. 10.1121/1.427110View ArticleGoogle Scholar
- Jeffress LA: A place theory of sound localization. Journal of Comparative and Physiological Psychology 1948, 41(1):35-39.View ArticleGoogle Scholar
- Joris PX, Smith PH, Yin TCT: Coincidence detection in the auditory system: 50 years after Jeffress. Neuron 1998, 21(6):1235-1238. 10.1016/S0896-6273(00)80643-1View ArticleGoogle Scholar
- Breebaart J, van de Par S, Kohlrausch A: Binaural processing model based on contralateral inhibition. I. Model structure. Journal of the Acoustical Society of America 2001, 110(2):1074-1088. 10.1121/1.1383297View ArticleGoogle Scholar
- Breebaart J, van de Par SD, Kohlrausch A: Binaural processing model based on contralateral inhibition. II. Dependence on spectral parameters. Journal of the Acoustical Society of America 2001, 110(2):1089-1104. 10.1121/1.1383298View ArticleGoogle Scholar
- Breebaart J, van de Par SD, Kohlrausch A: Binaural processing model based on contralateral inhibition. III. Dependence on temporal parameters. Journal of the Acoustical Society of America 2001, 110(2):1105-1117. 10.1121/1.1383299View ArticleGoogle Scholar
- Faller C, Merimaa J: Source localization in complex listening situations: selection of binaural cues based on interaural coherence. Journal of the Acoustical Society of America 2004, 116(5):3075-3089. 10.1121/1.1791872View ArticleGoogle Scholar
- Goupell MJ, Hartmann WM: Interaural fluctuations and the detection of interaural incoherence: bandwidth effects. Journal of the Acoustical Society of America 2006, 119(6):3971-3986. 10.1121/1.2200147View ArticleGoogle Scholar
- Zurek PM: The precedence effect. In Directional Hearing. Edited by: Yost WA, Gourevitch G. Springer, New York, NY, USA; 1987:85-105.View ArticleGoogle Scholar
- Litovsky RY, Rakerd B, Yin TCT, Hartmann WM: Psychophysical and physiological evidence for a precedence effect in the median sagittal plane. Journal of Neurophysiology 1997, 77(4):2223-2226.Google Scholar
- Fletcher H: Auditory patterns. Reviews of Modern Physics 1940, 12(1):47-65. 10.1103/RevModPhys.12.47View ArticleGoogle Scholar
- Scharf B: Critical bands. In Foundations of Modern Auditory Theory. Academic Press, New York, NY, USA; 1970.Google Scholar
- Faller C, Baumgarte F: Binaural cue coding—part II: schemes and applications. IEEE Transactions on Speech and Audio Processing 2003, 11(6):520-531. 10.1109/TSA.2003.818108View ArticleGoogle Scholar
- Baumgarte F: Improved audio coding using a psychoacoustic model based on a cochlear filter bank. IEEE Transactions on Speech and Audio Processing 2002, 10(7):495-503. 10.1109/TSA.2002.804536View ArticleGoogle Scholar
- Breebaart J, Herre J, Faller C, et al.: MPEG spatial audio coding/MPEG surround: overview and current status. AES 119th Convention, October 2005, New York, NY, USAGoogle Scholar
- Hartmann WM, Constan ZA: Interaural coherence and the lateralization of noise by interaural level differences. Journal of the Acoustical Society of America 2001, 110(5):2680.View ArticleGoogle Scholar
- Breebaart J, van de Par S, Kohlrausch A, Schuijers E: Parametric coding of stereo audio. EURASIP Journal on Applied Signal Processing 2005, 2005(9):1305-1322. 10.1155/ASP.2005.1305View ArticleMATHGoogle Scholar
- Rödén J, Breebaart J, Hilpert J, et al.: A study of the MPEG surround quality versus bit-rate curve. AES 123rd Convention, October 2007, New York, NY, USAGoogle Scholar
- Breebaart J, Hotho G, Koppens J, Schuijers E, Oomen W, van de Par S: Background, concept, and architecture for the recent MPEG surround standard on multichannel audio compression. Journal of the Audio Engineering Society 2007, 55(5):331-351.Google Scholar
- Hilpert J, Disch S: The MPEG surround audio coding standard. IEEE Signal Processing Magazine 2009, 26(1):148-152.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.