首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Somatosensory basis of speech production   总被引:1,自引:0,他引:1  
Tremblay S  Shiller DM  Ostry DJ 《Nature》2003,423(6942):866-869
The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced.  相似文献   

2.
Eliades SJ  Wang X 《Nature》2008,453(7198):1102-1106
Vocal communication involves both speaking and hearing, often taking place concurrently. Vocal production, including human speech and animal vocalization, poses a number of unique challenges for the auditory system. It is important for the auditory system to monitor external sounds continuously from the acoustic environment during speaking despite the potential for sensory masking by self-generated sounds. It is also essential for the auditory system to monitor feedback of one's own voice. This self-monitoring may play a part in distinguishing between self-generated or externally generatedauditory inputs and in detecting errors in our vocal production. Previous work in humans and other animals has demonstrated that the auditory cortex is largely suppressed during speaking or vocalizing. Despite the importance of self-monitoring, the underlying neural mechanisms in the mammalian brain, in particular the role of vocalization-induced suppression, remain virtually unknown. Here we show that neurons in the auditory cortex of marmoset monkeys (Callithrix jacchus) are sensitive to auditory feedback during vocal production, and that changes in the feedback alter the coding properties of these neurons. Furthermore, we found that the previously described cortical suppression during vocalization actually increased the sensitivity of these neurons to vocal feedback. This heightened sensitivity to vocal feedback suggests that these neurons may have an important role in auditory self-monitoring.  相似文献   

3.
Voice-selective areas in human auditory cortex   总被引:55,自引:0,他引:55  
Belin P  Zatorre RJ  Lafaille P  Ahad P  Pike B 《Nature》2000,403(6767):309-312
The human voice contains in its acoustic structure a wealth of information on the speaker's identity and emotional state which we perceive with remarkable ease and accuracy. Although the perception of speaker-related features of voice plays a major role in human communication, little is known about its neural basis. Here we show, using functional magnetic resonance imaging in human volunteers, that voice-selective regions can be found bilaterally along the upper bank of the superior temporal sulcus (STS). These regions showed greater neuronal activity when subjects listened passively to vocal sounds, whether speech or non-speech, than to non-vocal environmental sounds. Central STS regions also displayed a high degree of selectivity by responding significantly more to vocal sounds than to matched control stimuli, including scrambled voices and amplitude-modulated noise. Moreover, their response to stimuli degraded by frequency filtering paralleled the subjects' behavioural performance in voice-perception tasks that used these stimuli. The voice-selective areas in the STS may represent the counterpart of the face-selective areas in human visual cortex; their existence sheds new light on the functional architecture of the human auditory cortex.  相似文献   

4.
Bendor D  Wang X 《Nature》2005,436(7054):1161-1165
Pitch perception is critical for identifying and segregating auditory objects, especially in the context of music and speech. The perception of pitch is not unique to humans and has been experimentally demonstrated in several animal species. Pitch is the subjective attribute of a sound's fundamental frequency (f(0)) that is determined by both the temporal regularity and average repetition rate of its acoustic waveform. Spectrally dissimilar sounds can have the same pitch if they share a common f(0). Even when the acoustic energy at f(0) is removed ('missing fundamental') the same pitch is still perceived. Despite its importance for hearing, how pitch is represented in the cerebral cortex is unknown. Here we show the existence of neurons in the auditory cortex of marmoset monkeys that respond to both pure tones and missing fundamental harmonic complex sounds with the same f(0), providing a neural correlate for pitch constancy. These pitch-selective neurons are located in a restricted low-frequency cortical region near the anterolateral border of the primary auditory cortex, and is consistent with the location of a pitch-selective area identified in recent imaging studies in humans.  相似文献   

5.
刘芹 《科学技术与工程》2013,13(21):6107-6110,6133
大多数环境声是不和谐的,比语声和乐声更加不平稳。针对传统时频分析的不足,提出一种基于听觉感知的环境声特征提取方法。针对研究的小样本问题,采用支持向量机(Support Vector Machine,SVM)作为分类算法,对环境声进行分类。仿真结果表明所用特征及方法是有效的。  相似文献   

6.
基于从深度神经网络提取的瓶颈特征具有语音长时相关性和紧凑表示的特点, 将瓶颈特征及其与MFCC的复合特征用于藏语连续语音识别任务中, 可以代替传统的MFCC特征进行GMM-HMM声学建模。在藏语拉萨话连续语音识别任务中的实验表明, 瓶颈特征的复合特征取得比深度神经网络后验特征和单瓶颈特征更好的识别表现。  相似文献   

7.
Harper NS  McAlpine D 《Nature》2004,430(7000):682-686
A sound, depending on the position of its source, can take more time to reach one ear than the other. This interaural (between the ears) time difference (ITD) provides a major cue for determining the source location. Many auditory neurons are sensitive to ITDs, but the means by which such neurons represent ITD is a contentious issue. Recent studies question whether the classical general model (the Jeffress model) applies across species. Here we show that ITD coding strategies of different species can be explained by a unifying principle: that the ITDs an animal naturally encounters should be coded with maximal accuracy. Using statistical techniques and a stochastic neural model, we demonstrate that the optimal coding strategy for ITD depends critically on head size and sound frequency. For small head sizes and/or low-frequency sounds, the optimal coding strategy tends towards two distinct sub-populations tuned to ITDs outside the range created by the head. This is consistent with recent observations in small mammals. For large head sizes and/or high frequencies, the optimal strategy is a homogeneous distribution of ITD tunings within the range created by the head. This is consistent with observations in the barn owl. For humans, the optimal strategy to code ITDs from an acoustically measured distribution depends on frequency; above 400 Hz a homogeneous distribution is optimal, and below 400 Hz distinct sub-populations are optimal.  相似文献   

8.
通信链路层特征盲识别是智能通信和通信对抗领域关键技术。为提高基于IEEE 802.11协议的无线(局域)网/无线保真(wireless fidelity,Wi-Fi)信号的编码参数盲识别精度,提出了一种基于深度学习的低密度奇偶校验码(low density parity check code,LDPC)编码参数盲识别算法,可准确盲识别信道编码算法的信息位码长和码率。算法以解调后的比特流为训练数据集,搭建多层深度神经网络模型,经过多次调参和迁移训练,最终得到了能够准确预测编码参数的网络模型。实验结果表明,该网络模型能够在高达10%误码条件下得到优于91%的编码参数盲预测率,在无误码的条件下,编码参数盲预测准确度高达95.32%,为智能通信和通信对抗的研究提供了一定参考价值。  相似文献   

9.
针对盲人群体的视觉补偿问题,提出了一种基于听觉通路的盲人无损视觉补偿系统.该系统采集周围的环境信息,先利用带优化截断的嵌入式方块编码(embedded block coding with optimized truncation,EBCOT)方法对环境信息编码,再利用小波变换对感兴趣区域(region of interest,ROI)编码.编码后的环境信息经传输、解码,得到重建的图像数据.按照极坐标扫描重建的图像数据,将图像中像素点的坐标信息和灰度值映射成声音信号的时间、频率及幅度,按正弦波模型合成声音信号.结果表明,采用EBCOT,图像编码速度快,能获得较多的信息量,提高传输速率;用极坐标方...  相似文献   

10.
L Xu  S Furukawa  J C Middlebrooks 《Nature》1999,399(6737):688-691
Humans and cats can localize a sound source accurately if its spectrum is fairly broad and flat, as is typical of most natural sounds. However, if sounds are filtered to reduce the width of the spectrum, they result in illusions of sources that are very different from the actual locations, particularly in the up/down and front/back dimensions. Such illusions reveal that the auditory system relies on specific characteristics of sound spectra to obtain cues for localization. In the auditory cortex of cats, temporal firing patterns of neurons can signal the locations of broad-band sounds. Here we show that such spike patterns systematically mislocalize sounds that have been passed through a narrow-band filter. Both correct and incorrect locations signalled by neurons can be predicted quantitatively by a model of spectral processing that also predicts correct and incorrect localization judgements by human listeners. Similar cortical mechanisms, if present in humans, could underlie human auditory spatial perception.  相似文献   

11.
在对最新的MPEG4中的音频编码中码激励线性预测CELP(code excited linear predicive)编码器分析和研究的基础上,根据其窄带语音编码器的参数模式,建立和实现了一个基于CELP的语音编码实验系统,将高效的CELP编码技术应用于文语转换TTS(text-to-speech)系统中语音数据库的压缩,效果是满意的。  相似文献   

12.
区别性特征及多余特征   总被引:1,自引:0,他引:1  
区别性特征的创立是现代音系学开创性的发展,音位不再被看成是音系分析的最小单位,而是一系列区别性特征的集合,区别性特征被用于描写语音的自然类并对其规则进行概括,由于有些特征可以根据其它区别性特征预测,这些被称为多余特征的信息可以在描写语音时省掉,多余特征的值可以运用多余特征规则来赋予。  相似文献   

13.
Stable propagation of synchronous spiking in cortical neural networks   总被引:25,自引:0,他引:25  
Diesmann M  Gewaltig MO  Aertsen A 《Nature》1999,402(6761):529-533
The classical view of neural coding has emphasized the importance of information carried by the rate at which neurons discharge action potentials. More recent proposals that information may be carried by precise spike timing have been challenged by the assumption that these neurons operate in a noisy fashion--presumably reflecting fluctuations in synaptic input and, thus, incapable of transmitting signals with millisecond fidelity. Here we show that precisely synchronized action potentials can propagate within a model of cortical network activity that recapitulates many of the features of biological systems. An attractor, yielding a stable spiking precision in the (sub)millisecond range, governs the dynamics of synchronization. Our results indicate that a combinatorial neural code, based on rapid associations of groups of neurons co-ordinating their activity at the single spike level, is possible within a cortical-like network.  相似文献   

14.
Mesgarani N  Chang EF 《Nature》2012,485(7397):233-236
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.  相似文献   

15.
为提高被噪声干扰的语音的可理解性和语音质量,针对用于语音增强的深度复数网络对语音复数谱中关键声学特征提取不充分、关联信息建模不合理的问题,提出了基于多维度注意力机制和复数Conformer的单通道语音增强方法(SE-MDACC)。在复数U-Net架构下引入复数Conformer,对语音幅度和相位的相关性进行建模;利用多维度注意力机制,构造更加丰富的特征来增强卷积层的表示能力;在残差连接中加入注意力门控机制强化重构语音的细节信息。实验结果显示,相比于深度复数卷积递归网络,SE-MDACC的客观评价指标语音质量感知评估和短时客观可懂度分别提升15.299%、1.462%,表明SE-MDACC可充分提取语音声学特征并对幅度和相位相关性进行合理建模,有效提升语音质量和可理解性。  相似文献   

16.
基于粗神经网络的语音情感识别   总被引:1,自引:1,他引:0  
语音情感识别是从语音信号中提取一些有效的声学特征,然后利用智能计算或者识别的方法对话者的情感状态进行识别。介绍了国内外在该领域中关于语音情感数据库、特征提取、识别方法的研究现状。基于对该领域现状的了解,发现特征提取对识别率有着非常大的影响。录制了1050句语音,每句语音提取了30个特征,从而形成了一个1050×30的数据库。提出了用粗糙集理论中的信息一致性对数据库中的30个特征进行化简,最后得到了12个特征。用神经网络中的BP网络对话者的情感状态进行识别,最高识别率达到了84%。从实验结果发现不同的情感用不同的方法识别结果更好。  相似文献   

17.
Chimaeric sounds reveal dichotomies in auditory perception   总被引:18,自引:0,他引:18  
Smith ZM  Delgutte B  Oxenham AJ 《Nature》2002,416(6876):87-90
By Fourier's theorem, signals can be decomposed into a sum of sinusoids of different frequencies. This is especially relevant for hearing, because the inner ear performs a form of mechanical Fourier transform by mapping frequencies along the length of the cochlear partition. An alternative signal decomposition, originated by Hilbert, is to factor a signal into the product of a slowly varying envelope and a rapidly varying fine time structure. Neurons in the auditory brainstem sensitive to these features have been found in mammalian physiological studies. To investigate the relative perceptual importance of envelope and fine structure, we synthesized stimuli that we call 'auditory chimaeras', which have the envelope of one sound and the fine structure of another. Here we show that the envelope is most important for speech reception, and the fine structure is most important for pitch perception and sound localization. When the two features are in conflict, the sound of speech is heard at a location determined by the fine structure, but the words are identified according to the envelope. This finding reveals a possible acoustic basis for the hypothesized 'what' and 'where' pathways in the auditory cortex.  相似文献   

18.
增强型全速率语音编码的原理及实现   总被引:1,自引:0,他引:1  
在无线通信系统中,语音压缩编码起着非常重要的作用,因为它在很大程度上决定着合成话音的质量和系统容量。为了提高话音质量,GSM提出了增强型全速率(EFR)语音编码方案。它在LPC声码器的基础上,采用了A-B,S和VQ等技术,编码信息中既包含若干语音特征参量又包括部分波形编码信息。因此,能提供高质量的编码,且比特速率压缩到12.2kbps,为TD-SCDMA移动通令系统提供了一咎可行的语音编码方式。笔者从语音编码的基本概念出发,详细地介绍了EFR语音编码的原理及代数码本搜索实现技术。  相似文献   

19.
Gutnisky DA  Dragoi V 《Nature》2008,452(7184):220-224
Our perception of the environment relies on the capacity of neural networks to adapt rapidly to changes in incoming stimuli. It is increasingly being realized that the neural code is adaptive, that is, sensory neurons change their responses and selectivity in a dynamic manner to match the changes in input stimuli. Understanding how rapid exposure, or adaptation, to a stimulus of fixed structure changes information processing by cortical networks is essential for understanding the relationship between sensory coding and behaviour. Physiological investigations of adaptation have contributed greatly to our understanding of how individual sensory neurons change their responses to influence stimulus coding, yet whether and how adaptation affects information coding in neural populations is unknown. Here we examine how brief adaptation (on the timescale of visual fixation) influences the structure of interneuronal correlations and the accuracy of population coding in the macaque (Macaca mulatta) primary visual cortex (V1). We find that brief adaptation to a stimulus of fixed structure reorganizes the distribution of correlations across the entire network by selectively reducing their mean and variability. The post-adaptation changes in neuronal correlations are associated with specific, stimulus-dependent changes in the efficiency of the population code, and are consistent with changes in perceptual performance after adaptation. Our results have implications beyond the predictions of current theories of sensory coding, suggesting that brief adaptation improves the accuracy of population coding to optimize neuronal performance during natural viewing.  相似文献   

20.
脑对双耳听觉信息整合的神经机制   总被引:1,自引:0,他引:1  
综述近60 a来有关脑对双耳听觉信息整合的神经机制的研究进展.首先介绍了脑处理双耳信息的神经解剖学基础,双耳神经元的分类及其生理特性,以及双耳神经元在听觉系统的拓扑学分布研究;然后对脑处理双耳听觉信息研究的热点领域进行了重点探讨,综述了上橄榄复合体、下丘和听皮层双耳神经元对双耳时间差和双耳强度差的编码方式,以及脑通过对这些参数的编码来分析声源方位的神经生理学研究进展;最后对该领域未来研究方向作展望.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号