首页 | 本学科首页   官方微博 | 高级检索  
     

一种面向医学短文本的自适应聚类方法
引用本文:栗伟,许洪涛,赵大哲,刘积仁. 一种面向医学短文本的自适应聚类方法[J]. 东北大学学报(自然科学版), 2015, 36(1): 19-23. DOI: 10.12068/j.issn.1005-3026.2015.01.005
作者姓名:栗伟  许洪涛  赵大哲  刘积仁
作者单位:(1. 东北大学 医学影像计算教育部重点实验室, 辽宁 沈阳110819; 2. 郑州市人力资源和社会保障数据管理中心, 河南 郑州450000; 3. 东软集团股份有限公司, 辽宁 沈阳110179)
基金项目:国家自然科学基金资助项目(61172002); 国家科技支撑计划项目(2014BAI17B01); 国家高技术研究发展计划项目(2012AA02A607).
摘    要:针对电子病历中疾病诊断文本同义词识别和命名标准化问题,提出了一种自适应的文本聚类方法.首先提出了一种新的基于集合的文本相似性度量算法;然后采用基于相似度分布的文本聚类算法实现同义文本识别,该算法能够自动确定类簇个数;最后采用基于序列模式的中心概念提取算法实现了疾病命名的标准化,同时对聚类簇进行合并和优化,进一步提升了聚类的准确性.测试结果表明,所述方法具有较高的准确率和聚类效率,在病历文本的预处理、分类和分析中具有广泛意义.

关 键 词:聚类分析  相似性度量  频繁序列模式  电子病历  相似度分布  

An Adaptive Clustering Method on Medical Short Text
LI Wei,XU Hong-tao,ZHAO Da-zhe,LIU Ji-ren. An Adaptive Clustering Method on Medical Short Text[J]. Journal of Northeastern University(Natural Science), 2015, 36(1): 19-23. DOI: 10.12068/j.issn.1005-3026.2015.01.005
Authors:LI Wei  XU Hong-tao  ZHAO Da-zhe  LIU Ji-ren
Affiliation:1. Key Laboratory of Medical Image Computing, Ministry of Education, Northeastern University, Shenyang 110819, China; 2. The Zhengzhou Municipal Human Resources and Social Security Data Management Center, Zhengzhou 450000, China; 3. Neusoft Group Ltd., Shenyang 110179, China.
Abstract:An adaptive clustering method on short text was presented for synonyms text recognition and disease naming standardization of diagnosis in electronic medical record. Firstly, a new set based text similarity measure algorithm was proposed. Then, a similarity distribution based text clustering algorithm which could automatically determine the number of clusters was applied to recognize the synonymous disease texts. Finally, the disease naming texts were standardized by the central concept extraction algorithm based on frequent sequence pattern, while clusters were merged and optimized to further improve the clustering accuracy. The results showed that the proposed approach has a high accuracy and clustering efficiency which is of great significance for medical application such as medical text preprocessing, classification and analysis.
Keywords:clustering analysis  similarity measurement  frequent sequence pattern  electronic medical record  similarity distribution
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《东北大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号