期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

English Speech Recognition System on Chip

刘鸿钱彦旻刘加《清华大学学报》2011,16(1):95-99

An English speech recognition system was implemented on a chip,called speech system-on-chip (SoC).The SoC included an application specific integrated circuit with a vector accelerator to improve performance.The sub-word model based on a continuous density hidden Markov model recognition algorithm ran on a very cheap speech chip.The algorithm was a two-stage fixed-width beam-search baseline system with a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce the recognition time.Tests show that this method reduces the recognition time nearly 6 fold and the memory size nearly 2 fold compared to the original system,with less than 1% accuracy degradation for a 600 word recognition task and recognition accuracy rate of about 98%. 相似文献

2.

An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System

张劲松 ;Takatoshi Jitsuhiro ;Hirofumi Yamamoto ;胡新辉 ;Satoshi Nakamura 《清华大学学报》2008,13(4):545-552

相似文献

3.

A new frequency scale of Chinese whispered speech in the application of speaker identification 总被引：1，自引：0，他引：1

LIN Wei YANG Lili XU Boling 《自然科学进展(英文版)》2006,16(10):1072-1078

In this paper, the frequency characteristics of Chinese whispered speech were investigated by a filter bank analysis. It was shown that the first and the third formants were more important than the other formants in the speaker identification of Chinese whispered speech. The experiment showed that the 800?1200 Hz and 2800?3200 Hz ranges were the most significant frequency ranges in discriminating the speaker. Based on this result, a new feature scale named whisper sensitive scale (WSS) was proposed to replace the common scale, Mel scale, and to extract the cepstral coefficient from whispered speech signal. Furthermore, a speaker identification system in whispered speech was presented based on the modified Hidden Markov Models integrating advantages of WSCC (the whisper sensitive cepstral coefficient) and LPCC. And the new system performed better in solving the problem of speaker identification of Chinese whispered speech than the traditional method. 相似文献

4.

Trainable unit selection speech synthesis under statistical framework

Wang RenHua Dai LiRong Ling ZhenHua Hu Yu 《科学通报(英文版)》2009,54(11):1963-1969

This paper proposes a trainable unit selection speech synthesis method based on statistical modeling framework. At training stage, acoustic features are extracted from the training database and statistical models are estimated for each feature. During synthesis, the optimal candidate unit sequence is searched out from the database following the maximum likelihood criterion derived from the trained models. Finally, the waveforms of the optimal candidate units are concatenated to produce synthetic speech. Experiment results show that this method can improve the automation of system construction and naturalness of synthetic speech effectively compared with the conventional unit selection synthe- sis method. Furthermore, this paper presents a minimum unit selection error model training criterion according to the characteristics of unit selection speech synthesis and adopts discriminative training for model parameter estimation. This criterion can finally achieve the full automation of system con- struction and improve the naturalness of synthetic speech further. 相似文献

5.

Improved MFCC-Based Feature for Robust Speaker Identification 总被引：2，自引：0，他引：2

吴尊敬曹志刚《清华大学学报》2005,10(2):158-161

The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically degrade the performance of recognition systems because of the mismatches between training and testing. In this paper, the logarithmic transformation in the standard MFCC analysis is replaced by a combined function to improve the noisy sensitivity. The proposed feature extraction process is also combined with speech enhancement methods, such as spectral subtraction and median-filter to further suppress the noise. Experiments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range. 相似文献

6.

Stream Weight Training Based on MCE for Audio-Visual LVCSR 总被引：2，自引：0，他引：2

刘鹏王作英《清华大学学报》2005,10(2):141-144

In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion isdiscussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice rescoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental results show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments. 相似文献

7.

Operational Gesture Segmentation and Recognition

马赓宇林学訚《清华大学学报》2003,8(2)

Gesture analysis by computer is an important part of the human computer interface (HCI) and a gesture analysis method was developed using a skin-color-based method to extract the area representing the hand in a single image with a distribution feature measurement designed to describe the hand shape in the images. A hidden Markov model (HMM) based method was used to analyze the temporal variation and segmentation of continuous operational gestures. Furthermore, a transition HMM was used to represent the period between gestures, so the method could segment continuous gestures and eliminate non-standard gestures. The system can analyze 2 frames per second, which is sufficient for real time analysis. 相似文献

8.

The environmental index of the rare earth elements in conodonts: Evidence from the Ordovician conodonts of the Huanghuachang Section,Yichang area

XiaoHong Chen Lian Zhou Kai Wei Jin Wang ZhiHong Li 《科学通报(英文版)》2012,57(4):349-359

High-resolution microanalysis was performed on conodonts collected from the Huanghuachang section in the Yichang area using laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS). This region is regarded as a standard section for the division and correlation of the Ordovician system in southern China. The results show that the values of (La/Yb)N and (La/Sm)N decrease, while the values of δCe increase as seawater deepens and energy decreases. As the sedimentary environment changes from shallow-water carbonate platform to platform margin to open continental shelf to shelf basin, rare earth element distribution curves gradually transform from a right inclined pattern to a flat pattern to a left inclined pattern and a hat-shaped pattern. The present work proves that the values and distributive patterns of rare earth elements in conodonts correspond with the sedimentary environment, and therefore provide reliable evidence for the application of rare earth element concentrations of biogenic phosphates such as conodonts for palaeoenvironmental reconstructions. 相似文献

9.

Identification of Chinese Materia Medicas in Microscopic Powder Images

Shen Yan Yaoli Li Yixu Song Li Lin Peifa Jia Shaoqing Cai 《清华大学学报》2012,(2):209-217

This paper describes an identification system for Chinese Materia Medicas (CMMs) in microscopic powder images.The imaging processing of the microscopic powder image is very complex because of the low contrast,blurry boundaries,overlapping objects,and messy background.Therefore,the object detection must segment the significant microscopic structures from the complex image.The objects are detected in these images using an adaptable interactive method.After identifying the significant microscopic structures,the system identifies 14 features belonging to three main characteristics.These features form a 14-dimensional vector that represents the microscopic structures.The multi-dimensional vector is then analyzed using a feature assignment algorithm that picks the most notable features to construct a decision tree with thresholds.The identification system consists of a coarse classifier based on the decision tree and a fine classifier using similarity measurements to rank the possible results.Tests on 528 images from 24 different kinds of microscopic structures show the system effectiveness and applicability. 相似文献

10.

A Rule Based System for Speech Language Context Understanding 总被引：1，自引：0，他引：1

Imran Sarwar Bajwa Muhammad Abbas Choudhary 《东华大学学报(英文版)》2006,23(6):39-42

相似文献

11.

Adaptive Compensation Algorithm in Open Vocabulary Mandarin Speaker-Independent Speech Recognition

Fadhil H.T.Al-dulaimy 王作英田野《清华大学学报》2002,7(5)

IntroductionA speech signal is normally mixed with many kindsof noises,which can significantly decrease theperformance of a speech recognizer.The highconcentration of energy in the low frequency rangeobserved for most speech spectra is considered anuisance because it makes less relevant the energyof the signal at middle and high frequencies[1] .　The performance of automatic continuous speechrecognition (ACSR ) systems dramaticallydecreases when they are trained and used indifferent environm… 相似文献

12.

高性能汉语数码语音识别算法 总被引：13，自引：0，他引：13

李虎生刘加刘润生《清华大学学报(自然科学版)》2000,40(1)

提出了一个高性能的汉语数码语音识别 (MDSR)系统。 MDSR系统使用 Mel频标倒谱系数 (MFCC)作为主要的语音特征参数 ,同时提取共振峰轨迹和鼻音特征以区分一些易混语音对 ,并提出一个基于语音特征的实时端点检测算法 ,以减少系统资源需求 ,提高抗干扰能力。采用了两级识别框架来提高语音的区分能力 ,其中第一级识别用于确定识别候选结果 ,第二级识别用于区分易混语音对。由于采用了以上改进 ,MDSR系统识别率达到了 98.8% . 相似文献

13.

用于抗噪声语音识别的谐振强度特征

许超曹志刚《清华大学学报(自然科学版)》2004,44(1):22-24

基于传统的Mel倒谱系数(MFCC)系列特征的语音识别系统在噪声环境中的识别性能会急剧下降。为了进行噪声环境中的自动语音识别,提出了一种反映语音信号谐振程度的特征:谐振强度,并用之代替传统MFCC特征中的能量维(零维倒谱C0,或者帧能量E)。在展览馆噪声、人群噪声和汽车噪声等情况下的语音识别实验结果表明:基于这种新特征的语音识别系统比基于传统特征的语音识别系统有更高的平均识别率和更好的抗噪声能力。相似文献

14.

基于正弦模型的语音识别时频特征

下载免费PDF全文

邢艳玲杨吉斌张雄伟《解放军理工大学学报(自然科学版)》2004,5(1):22-25

为改善语音识别系统的性能，采用时频分布参数来描述语音特征。由于时频分布参数考虑到语音信号内在的非平稳特性，因此能够更准确地描述语音信号的时频特性。对基于正弦模型的多种时频参数(能量谱和幅度加权瞬时频谱)进行了比较，并在基于隐马尔可夫模型的连接词语音识别系统中进行了实验仿真。结果表明，单独采用时频分布参数作为ASR的前端特征并不能改善识别率；而采用标准ASR特征和能量谱时频特征的联合前端特征，可以有效地改善语音识别系统的识别效果。相似文献

15.

EMD结合Teager能量用于语音情感识别

张卫《科学技术与工程》2013,13(24)

在语音情感识别系统中,语音情感特征的提取尤为重要,本文在前期已有对EMD分解研究的基础上,将EMD分解与Teager能量算子相结合,用于语音情感识别。文中首先利用EMD分解得到一组IMF分量,再对各阶IMF分量提取Teager能量,然后通过对不同语种的不同情感语音的Teager能量在Mel频率的分析,提出了一种新的情感特征：基于EMD分解的Mel频率的Teager能量谱系数(ETMC),最后利用SVM分类方法对不同语种的不同情感进行识别,实验结果表明,该方法有很好的识别结果。相似文献

16.

藏语孤立词语音识别系统研究 总被引：3，自引：0，他引：3

姚徐李永宏单广荣于洪志《西北民族学院学报》2009,30(1):29-36

藏语语音研究相当滞后,文章结合语音识别知识和藏语特点,尝试性地研究了藏语孤立词语音识别研究.首先提取MFCC参数作为语音特征参数,形成语音模板库,采用DTW模型实现了语音识别系统,并且针对藏语孤立词多音节的特点,改进了传统的基于短时能量和短时过零率双门限检测法,即加入了音节间静音段时长门限,提高了孤立词语音信号检测的准确性和识别率. 相似文献

17.

基于BPNN/HMM神经网络的声学模型研究 总被引：1，自引：0，他引：1

李凡吴军黄刚《华中科技大学学报(自然科学版)》2004,32(9):9-11

研制了一种基于BP神经网络和隐马尔可夫模型(HMM)的混合声学模型，BP神经网络的主要功能是把失真语音特征矢量转换成纯净语音特征矢量，而删则对转换后的纯净语音特征矢量进行分类，从模型级补偿的方面来提高语音识别系统的鲁棒性．讨论了一种基于线性预测的MKCC语音特征提取方法，该方法把提取出的失真语音特征矢量作为神经网络的输入，从而实现了特征参数级去噪处理的目的．相似文献

18.

一种时间规整算法在神经网络语音识别中的应用 总被引：6，自引：0，他引：6

史笑兴顾明亮王太君何振亚《东南大学学报(自然科学版)》1999,29(5):47-51

提出一种新的网络结构,这种网络能够很好地解决神经网络语音识别中的时间规整问题。该网络从输入语音信号的特征矢量序列中提取一组固定数目的特征矢量,然后将这组特征矢量馈入神经网络分类器进行识别。和其他的神经网络语音识别方法相比较,用这种网络进行前端处理,可以缩短后端神经网络分类器的训练和识别时间,简化分类器的网络结构并保持较高的识别率。相似文献

19.

一种新型语音识别系统 总被引：1，自引：0，他引：1

刘筠卢超《成都大学学报(自然科学版)》2008,27(3)

提出一种新型语音识别系统,采用帧能量与帧过零率的乘积作为指标量进行语音端点检测,以MFCC作为语音信号特征矢量,基于HMM语音识别模型进行语音识别.同时,提出了一种新的抗噪语音识别方法,通过改进型重复Wiener滤波结合PUM模型进行抗噪语音识别,较好地抑制了噪声干扰,提高了语音识别率. 相似文献

20.

基于LDA-MFCC的藏语语音特征提取技术研究

普次仁顿珠次仁《西藏大学学报》2014,(2):44-47

藏语特征提取算法是藏语语音识别系统中最为关键的一个环节。文章在分析藏语发音特点的基础上,建立了基于模拟人耳听觉系统的Mel倒谱系数（MFCC）特征提取算法,然后通过LDA信息压缩算法,对提取的特征数据进行压缩,在降低维数的同时提高了识别率和运算效率,总结出了符合藏语语音特点的LDA-MFCC特征提取算法。相似文献