首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于小样本数据统计的双阶段舌位建模研究
引用本文:徐正丽,肖素芳,简敏,杨明浩.基于小样本数据统计的双阶段舌位建模研究[J].广西科学,2023,30(4):745-753.
作者姓名:徐正丽  肖素芳  简敏  杨明浩
作者单位:桂林电子科技大学, 广西桂林 541004;中国科学院自动化研究所, 北京 100190
基金项目:国家自然科学基金项目(71463010,22180155466),广西科技计划项目(2021GXNSFBA220048,桂科AB21220038)和桂林科技计划项目(2023010123)资助。
摘    要:舌头是人类重要的发音器官,对发音时其形状的降维分析能有效协助语言学家分析人类的发音模式。主成分分析(Principal Component Analysis, PCA)是目前最常用的舌位轮廓降维分析方法。近年来,基于深度学习的自动编码器在降维方面被证明优于PCA。然而,舌头隐藏于口腔内部,难以获得大量的相关数据,这使得传统自动编码器无法直接用于舌位轮廓建模研究。为此,本文提出一种面向小样本舌位运动轮廓数据的双阶段自动编码器降维方法。首先该方法采用主动形状模型(Active Shape Model, ASM)产生大量舌头轮廓生理变形数据,并构建通用轮廓重建模型;接着,在第一阶段模型上添加降维层,用于对舌位轮廓数据进行压缩和分析。实验选取了从人类发音X光片中获得的240个元音舌形数据,并将该方法与传统PCA方法进行比较。结果表明,所提出方法获得的元音舌位图谱在二维平面上相对于传统PCA方法,区分度更好,具有更好的舌形降维和重建能力。

关 键 词:深度神经网络  自动编码器  主成分分析  舌位轮廓  隐藏单元
收稿时间:2023/2/15 0:00:00
修稿时间:2023/4/25 0:00:00

Tongue Shapes Modeling from Small Data Using Two-Stage Autoencoder
XU Zhengli,XIAO Sufang,JIAN Min,YANG Minghao.Tongue Shapes Modeling from Small Data Using Two-Stage Autoencoder[J].Guangxi Sciences,2023,30(4):745-753.
Authors:XU Zhengli  XIAO Sufang  JIAN Min  YANG Minghao
Institution:Guilin University of Electronic Technology, Guilin, Guangxi, 541004, China; Institute of Automation of the Chinese Academy of Sciences, Beijing, 100190, China
Abstract:The tongue plays a crucial role in human speech production.The dimensionality reduction analysis of tongue pronunciation can effectively assist linguists in analyzing human pronunciation patterns.Traditional methods for tongue position contour compression often relay on Principal Component Analysis (PCA) for dimensionality reduction.In recent years,deep-learning-based autoencoders have been widely used for data compression.However,they require a large number of samples and cannot be directly and effectively used for tongue motion pattern researches.Besides,obtaining a substantial volume of tongue movement data has been challenging due to the tongue''s location within the oral cavity.To address these limitations,this paper introduces a two-stage autoencoder dimensionality reduction method designed for small-sample tongue motion contour data.Firstly,Active Shape Model (ASM) is used to generate a large amount of physiological deformation data of tongue contour,and a general tongue contour reconstruction model is constructed based on a conventional automatic encoder.Secondly,on the basis of the automatic encoder in the previous stage,an additional network layer is added to compress and analyze the tongue position data.In experiments,240 vowel and tongue shape datasets obtained from X-ray films of human speech are selected.The tongue position model and traditional PCA methods were compared.The results show that the vowel tongue position map obtained by the proposed method exhibits better discrimination on the two dimensional plane,and has better tongue shape reconstruction performance.
Keywords:deep neural network  autoencoder  Principle Component Analysis (PCA)  tongue contour  hidden units
点击此处可从《广西科学》浏览原始摘要信息
点击此处可从《广西科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号