首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
听觉掩蔽门限在说话者识别中的应用   总被引:2,自引:0,他引:2  
语音信息在人的听觉系统中的表示具有一定的冗余性.利用这一特性把丢失数据技术应用于噪声环境下说话者识别系统的性能改进.听觉掩蔽效应这一听觉现象被用来检测语音信号频谱中被噪声严重干扰的“丢失成分”.经过丢失数据补偿技术结合语音增强处理,说话者识别系统在不利环境下的准确率得到了提高.通过对宽带噪声- 白噪声和一种特殊噪声——汽车噪声干扰下语音的说话者辨认实验,发现这种方法优于单独使用语音增强方法.  相似文献   

2.
S Treue  J C Martínez Trujillo 《Nature》1999,399(6736):575-579
Changes in neural responses based on spatial attention have been demonstrated in many areas of visual cortex, indicating that the neural correlate of attention is an enhanced response to stimuli at an attended location and reduced responses to stimuli elsewhere. Here we demonstrate non-spatial, feature-based attentional modulation of visual motion processing, and show that attention increases the gain of direction-selective neurons in visual cortical area MT without narrowing the direction-tuning curves. These findings place important constraints on the neural mechanisms of attention and we propose to unify the effects of spatial location, direction of motion and other features of the attended stimuli in a 'feature similarity gain model' of attention.  相似文献   

3.
该文指出了常用的倒谱均值归一方法在去除信道因素的同时,也去掉了一些说话人的语音特征,因此,在信道失配的环境下鲁棒性较差。提出利用信道间差异,补偿信道失配的信道空间映射方法,并构建了一个与文本无关对随机信道鲁棒的说话人识别系统。实验结果表明:对来自随机信道的说话人语音,第1名和前30名的正确识别率,与实验室基线系统的性能比较,分别提高了5.4%和18.6%。寻找并补偿信道间的差异,是一种提高说话人识别鲁棒性的有效方法。  相似文献   

4.
蔡铁  朱杰 《上海交通大学学报》2005,39(12):1997-2001
针对语音识别系统中快速说话人自适应问题,提出了一种支持说话人权重算法.该算法通过支持说话人的计算实现了说话人选择与自适应参数的降维,减少了自适应时的存储量,有效提高了自适应数据较少时的性能.有监督自适应的实验结果表明,在仅有一句自适应语句的情况下系统误识率相对非特定人(SI)系统下降了5.82%,明显优于其他快速自适应算法.  相似文献   

5.
采用基于听觉特性的Mel频率倒谱系数作为说话人识别特征参数,对概率神经网络进行了描述,并使用该网络进行了文本无关说话人识别研究.实验表明,对20名说话人,用7秒语音训练,3秒语音识别时,该方法可达到96.7%的正确识别率.  相似文献   

6.
S Bao  V T Chan  M M Merzenich 《Nature》2001,412(6842):79-83
Representations of sensory stimuli in the cerebral cortex can undergo progressive remodelling according to the behavioural importance of the stimuli. The cortex receives widespread projections from dopamine neurons in the ventral tegmental area (VTA), which are activated by new stimuli or unpredicted rewards, and are believed to provide a reinforcement signal for such learning-related cortical reorganization. In the primary auditory cortex (AI) dopamine release has been observed during auditory learning that remodels the sound-frequency representations. Furthermore, dopamine modulates long-term potentiation, a putative cellular mechanism underlying plasticity. Here we show that stimulating the VTA together with an auditory stimulus of a particular tone increases the cortical area and selectivity of the neural responses to that sound stimulus in AI. Conversely, the AI representations of nearby sound frequencies are selectively decreased. Strong, sharply tuned responses to the paired tones also emerge in a second cortical area, whereas the same stimuli evoke only poor or non-selective responses in this second cortical field in naive animals. In addition, we found that strong long-range coherence of neuronal discharge emerges between AI and this secondary auditory cortical area.  相似文献   

7.
Fractal dimension of voice-signal waveforms   总被引:2,自引:0,他引:2  
The fractal dimension is one important parameter that characterizes waveforms. In this paper, we derive a new method to calculate fractal dimension of digital voice-signal waveforms. We show that fractal dimension is an efficient tool for speaker recognition or speech recognition. It can be used to identify different speakers or distinguish speech. We apply our results to Chinese speaker recognition and numerical experiment shows that fractal dimension is an efficient parameter to characterize individual Chinese speakers. We have developed a semiautomatic voiceprint analysis system based on the theory of this paper and former researches. Foundation item: Supported by the Special Funds for May State Basic Research Projects Biography: Xie Yu-qiong(1964-), female, Ph. D candidate, research direction: fractal geometry.  相似文献   

8.
Houweling AR  Brecht M 《Nature》2008,451(7174):65-68
Understanding how neural activity in sensory cortices relates to perception is a central theme of neuroscience. Action potentials of sensory cortical neurons can be strongly correlated to properties of sensory stimuli and reflect the subjective judgements of an individual about stimuli. Microstimulation experiments have established a direct link from sensory activity to behaviour, suggesting that small neuronal populations can influence sensory decisions. However, microstimulation does not allow identification and quantification of the stimulated cellular elements. The sensory impact of individual cortical neurons therefore remains unknown. Here we show that stimulation of single neurons in somatosensory cortex affects behavioural responses in a detection task. We trained rats to respond to microstimulation of barrel cortex at low current intensities. We then initiated short trains of action potentials in single neurons by juxtacellular stimulation. Animals responded significantly more often in single-cell stimulation trials than in catch trials without stimulation. Stimulation effects varied greatly between cells, and on average in 5% of trials a response was induced. Whereas stimulation of putative excitatory neurons led to weak biases towards responding, stimulation of putative inhibitory neurons led to more variable and stronger sensory effects. Reaction times for single-cell stimulation were long and variable. Our results demonstrate that single neuron activity can cause a change in the animal's detection behaviour, suggesting a much sparser cortical code for sensations than previously anticipated.  相似文献   

9.
为解决语音识别系统实用中的说话人口音快速自适应问题,提出了一种动态说话人选择性训练方法。基于说话人选择性训练方法,采用基于Gauss混合模型似然分数计算的置信测度选择训练用说话人,改变训练用说话人的绝对数目选取方式,提高了选取的效能并拓展了选取标准的推广性。根据各个训练用说话人同被适应说话人的不同似然程度,加权地合成动态说话人选择性训练的语音模型,提高了自适应训练的效果。实验表明:该方法使识别率从80.16%提高到84.12%,相对误识率降低了19.96%,在实用中提高了基线系统的识别性能。  相似文献   

10.
Somatosensory basis of speech production   总被引:1,自引:0,他引:1  
Tremblay S  Shiller DM  Ostry DJ 《Nature》2003,423(6942):866-869
The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced.  相似文献   

11.
Limits on bilingualism   总被引:1,自引:0,他引:1  
A Cutler  J Mehler  D Norris  J Segui 《Nature》1989,340(6230):229-230
Speech, in any language, is continuous; speakers provide few reliable cues to the boundaries of words, phrases, or ther meaningful units. To understand speech, listeners must divide the continuous speech stream into portions that correspond to such units. This segmentation process is so basic to human language comprehension that psycholinguists long assumed that all speakers would do it in the same way. In previous research, however, we reported that segmentation routines can be language-specific: speakers of English do not. French has relatively clear syllable boundaries and syllable-based timing patterns, whereas English has relatively unclear syllable boundaries and stress-based timing; thus syllabic segmentation would work more efficiently in the comprehension of French than in the comprehension of English. Our present study suggests that at this level of language processing, there are limits to bilingualism: a bilingual speaker has one and only one basic language.  相似文献   

12.
Wehr M  Zador AM 《Nature》2003,426(6965):442-446
Neurons in the primary auditory cortex are tuned to the intensity and specific frequencies of sounds, but the synaptic mechanisms underlying this tuning remain uncertain. Inhibition seems to have a functional role in the formation of cortical receptive fields, because stimuli often suppress similar or neighbouring responses, and pharmacological blockade of inhibition broadens tuning curves. Here we use whole-cell recordings in vivo to disentangle the roles of excitatory and inhibitory activity in the tone-evoked responses of single neurons in the auditory cortex. The excitatory and inhibitory receptive fields cover almost exactly the same areas, in contrast to the predictions of classical lateral inhibition models. Thus, although inhibition is typically as strong as excitation, it is not necessary to establish tuning, even in the receptive field surround. However, inhibition and excitation occurred in a precise and stereotyped temporal sequence: an initial barrage of excitatory input was rapidly quenched by inhibition, truncating the spiking response within a few (1-4) milliseconds. Balanced inhibition might thus serve to increase the temporal precision and thereby reduce the randomness of cortical operation, rather than to increase noise as has been proposed previously.  相似文献   

13.
目前扬声器异常音检测中,主要使用人工听音和工程师依据经验设置门限法,受主观因素影响大,且不能实现扬声器异常音的分类。为此,提出了一种新的扬声器质量评价方法,即基于心理声学模型和粒子群优化的支持向量机扬声器异常音检测方法。提取并标记扬声器声音响应信号,将其输入心理声学模型,得出心理声学能量均值并输入支持向量机;利用粒子群算法进行调优,最终得到具有最优参数的支持向量机。经试验验证,该模型的检测准确率达到98%。与音色特征法相比,其检测准确率得到较大的提高并实现了异常音分类。  相似文献   

14.
针对单一声学特征和k-means算法在说话人聚类技术中的局限性,为了更好地表达说话人的个性信息并提高说话人聚类的准确率,将特征融合和AE-SOM神经网络应用于说话人聚类中,提出一种改进的说话人聚类算法.该算法通过对语音信号特征分析,将MFCC特征参数和LPCC特征参数相结合,从而完善说话人的个性信息.并在k-means...  相似文献   

15.
本实验采用了视听双通道伪同时呈现的oddball模式,以汉字和简单几何图形为视觉刺激,1000Hz和800Hz的纯音为听觉刺激,使用注意通道(注意和非注意条件)×刺激概率(偏差刺激概率均为15%,标准刺激的概率均为85%)的2×2因素设计,来研究视觉和听觉偏差刺激在注意和非注意条件下诱发的事件相关电位(ERPs)。实验中视觉和听觉刺激随机序列地呈现给被试(刺激间隔ISI为700~1300ms),被试被要求注意某一通道如视觉通道,而相应地忽视另一通道即听觉通道,以左右手触键反应,如左手反应视觉偏差刺激,右手反应视觉标准刺激。结果表明,听觉偏差刺激在注意和非注意条件下均诱发了类似的不匹配负波(MMN);而视觉偏差刺激在注意和非注意条件下没有诱发MMN或类似MMN的成分,这是因为视觉系统的平行加工特性和难以对视觉影像产生记忆痕迹。听觉偏差刺激在注意条件下重迭了N2b成分并跟随了P3a成分,这种重迭和跟随反映了选择注意中的定向反应。注意条件下听觉和视觉的偏差刺激诱发了较大波幅的P300成分,反映了工作记忆中的表象更新。本实验的结果支持Naatanen对MMN所做的观察,听觉偏差刺激所诱发的MMN与注意条件的无关性反映了听觉通道中感觉刺激特征的自动化加工。  相似文献   

16.
为了提高情感语音合成的质量,提出一种采用多个说话人的情感训练语料,利用说话人自适应实现基于深度神经网络的情感语音合成方法。该方法应用文本分析获得语音对应的文本上下文相关标注,并采用WORLD声码器提取情感语音的声学特征;采用文本的上下文相关标注和语音的声学特征训练获得与说话人无关的深度神经网络平均音模型,用目标说话人的目标情感的训练语音和说话人自适应变换获得与目标情感的说话人相关的深度神经网络模型,利用该模型合成目标情感语音。主观评测表明,与传统的基于隐马尔科夫模型的方法比较,该方法合成的情感语音的主观评分更高。客观实验表明,合成的情感语音频谱更接近原始语音。所以,该方法能够提高合成情感语音的自然度和情感度。  相似文献   

17.
基于小波变换的说话人语音特征参数提取   总被引:1,自引:3,他引:1  
在说话人识别系统中,提取反映说话人个性的语音特征参数是系统的关键问题之一,本文在研究小波变换理论的基础上,借鉴MFCC参数的提取方法,用小波变换代替傅立叶变换,提取了新的特征参数DWTMFC,并对常用的coif3、db6、db4、sym4、bior2.4这几种小波函数进行了比较,实验结果表明:coif3为提取语音特征参数的最优小波函数,DWTMFC参数的性能优于MFCC参数。  相似文献   

18.
Bendor D  Wang X 《Nature》2005,436(7054):1161-1165
Pitch perception is critical for identifying and segregating auditory objects, especially in the context of music and speech. The perception of pitch is not unique to humans and has been experimentally demonstrated in several animal species. Pitch is the subjective attribute of a sound's fundamental frequency (f(0)) that is determined by both the temporal regularity and average repetition rate of its acoustic waveform. Spectrally dissimilar sounds can have the same pitch if they share a common f(0). Even when the acoustic energy at f(0) is removed ('missing fundamental') the same pitch is still perceived. Despite its importance for hearing, how pitch is represented in the cerebral cortex is unknown. Here we show the existence of neurons in the auditory cortex of marmoset monkeys that respond to both pure tones and missing fundamental harmonic complex sounds with the same f(0), providing a neural correlate for pitch constancy. These pitch-selective neurons are located in a restricted low-frequency cortical region near the anterolateral border of the primary auditory cortex, and is consistent with the location of a pitch-selective area identified in recent imaging studies in humans.  相似文献   

19.
张霞 《皖西学院学报》2006,22(4):109-112
本文旨从奥斯汀的言语行为理论中“取效行为”出发,着眼于日常语言现象,针对三种不同形态的受话者对言语行为产生的积极和消极的影响,重点分析了较常见的第一种受话者形态,而提出语言是人的第二形象。说话者应根据不同的受话者这个角度来提高自己的语言交际能力,通过语言来传达说话者的思想,情感,修养以及学识等,以完善自己的言语行为,顺利实现人与人之间的交际与沟通。  相似文献   

20.
Bitterman Y  Mukamel R  Malach R  Fried I  Nelken I 《Nature》2008,451(7175):197-201
Just-noticeable differences of physical parameters are often limited by the resolution of the peripheral sensory apparatus. Thus, two-point discrimination in vision is limited by the size of individual photoreceptors. Frequency selectivity is a basic property of neurons in the mammalian auditory pathway. However, just-noticeable differences of frequency are substantially smaller than the bandwidth of the peripheral sensors. Here we report that frequency tuning in single neurons recorded from human auditory cortex in response to random-chord stimuli is far narrower than that typically described in any other mammalian species (besides bats), and substantially exceeds that attributed to the human auditory periphery. Interestingly, simple spectral filter models failed to predict the neuronal responses to natural stimuli, including speech and music. Thus, natural sounds engage additional processing mechanisms beyond the exquisite frequency tuning probed by the random-chord stimuli.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号