首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于拼音图的两阶段关键词检索系统
引用本文:罗骏,欧智坚,王作英.基于拼音图的两阶段关键词检索系统[J].清华大学学报(自然科学版),2005,45(10):1356-1359.
作者姓名:罗骏  欧智坚  王作英
作者单位:清华大学,电子工程系,北京,100084
基金项目:国家网络与信息安全保障持续发展计划(917专项)资助
摘    要:针对当前关键词检索系统中单阶段系统检索速度慢,基于大词汇量连续语音识别(LVCSR)的两阶段系统又不够稳健的现状,提出一种新的基于拼音图的两阶段检索系统以满足快速、稳健检索的需要。两阶段分为预处理阶段和检索阶段。预处理阶段将语音数据识别成具有高覆盖率的拼音图。检索阶段响应用户的频繁查询,在拼音图中查找出与关键词拼音匹配的拼音串,并采用基于N元拼音文法的前后向算法计算置信度以实现对检索结果的筛选。实验表明:系统的二字词召回率及正确率可达72.19%和72.68%,三字词召回率及正确率可达73.51%和82.98%,均优于LVCSR系统,且检索阶段仅需0.01倍实时,具有良好的实用价值。

关 键 词:信息检索  关键词检索  拼音图  置信度
文章编号:1000-0054(2005)10-1356-04
修稿时间:2004年10月20

Two-stage keyword spotting system based on syllable graphs
LUO Jun,OU Zhijian,WANG Zuoying.Two-stage keyword spotting system based on syllable graphs[J].Journal of Tsinghua University(Science and Technology),2005,45(10):1356-1359.
Authors:LUO Jun  OU Zhijian  WANG Zuoying
Abstract:One-stage keyword spotting systems are time consuming,while two-stage systems based on large vocabulary continuousspeech recognition(LVCSR) are instable.This paper introduces atwo-stage keyword spotting system based on syllable graphs for fastand stable information retrieval from speech data.The systemincludes preprocessing and searching.In the preprocessing stage,the audio data is recognized into the syllable graph with highaccuracy syllable candidates.In the search stage,searches for thematched keyword are only performed in the graph for likely syllablestrings to answer frequent users queries.A forward-backwardalgorithm based on syllable N-grammar model is used to calculateconfidence measures for further filtering of the search result.Testresults show that the system achieves 72.19% recall rate and72.68% accuracy with 2-syllable words and 73.51% recall rate and82.98% accuracy with 3-syllable words,which outperforms theLVCSR system.The search stage uses only 1% of the real time,which is needed on practical applications.
Keywords:information retrieval  keyword spotting  syllablegraph  confidence measure
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号