一种面向医学短文本的自适应聚类方法 An Adaptive Clustering Method on Medical Short Text期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种面向医学短文本的自适应聚类方法

引用本文：	栗伟,许洪涛,赵大哲,刘积仁. 一种面向医学短文本的自适应聚类方法[J]. 东北大学学报(自然科学版), 2015, 36(1): 19-23. DOI: 10.12068/j.issn.1005-3026.2015.01.005

作者姓名：	栗伟许洪涛赵大哲刘积仁

作者单位：	(1. 东北大学医学影像计算教育部重点实验室，辽宁沈阳110819; 2. 郑州市人力资源和社会保障数据管理中心，河南郑州450000; 3. 东软集团股份有限公司，辽宁沈阳110179)

基金项目：	国家自然科学基金资助项目(61172002); 国家科技支撑计划项目(2014BAI17B01); 国家高技术研究发展计划项目(2012AA02A607).

摘要：	针对电子病历中疾病诊断文本同义词识别和命名标准化问题,提出了一种自适应的文本聚类方法.首先提出了一种新的基于集合的文本相似性度量算法;然后采用基于相似度分布的文本聚类算法实现同义文本识别,该算法能够自动确定类簇个数;最后采用基于序列模式的中心概念提取算法实现了疾病命名的标准化,同时对聚类簇进行合并和优化,进一步提升了聚类的准确性.测试结果表明,所述方法具有较高的准确率和聚类效率,在病历文本的预处理、分类和分析中具有广泛意义.
关键词：	聚类分析相似性度量频繁序列模式电子病历相似度分布
An Adaptive Clustering Method on Medical Short Text

LI Wei,XU Hong-tao,ZHAO Da-zhe,LIU Ji-ren. An Adaptive Clustering Method on Medical Short Text[J]. Journal of Northeastern University(Natural Science), 2015, 36(1): 19-23. DOI: 10.12068/j.issn.1005-3026.2015.01.005

Authors:	LI Wei XU Hong-tao ZHAO Da-zhe LIU Ji-ren

Affiliation:	1. Key Laboratory of Medical Image Computing， Ministry of Education， Northeastern University， Shenyang 110819， China; 2. The Zhengzhou Municipal Human Resources and Social Security Data Management Center， Zhengzhou 450000， China; 3. Neusoft Group Ltd.， Shenyang 110179， China.

Abstract:	An adaptive clustering method on short text was presented for synonyms text recognition and disease naming standardization of diagnosis in electronic medical record. Firstly， a new set based text similarity measure algorithm was proposed. Then， a similarity distribution based text clustering algorithm which could automatically determine the number of clusters was applied to recognize the synonymous disease texts. Finally， the disease naming texts were standardized by the central concept extraction algorithm based on frequent sequence pattern， while clusters were merged and optimized to further improve the clustering accuracy. The results showed that the proposed approach has a high accuracy and clustering efficiency which is of great significance for medical application such as medical text preprocessing， classification and analysis.

Keywords:	clustering analysis similarity measurement frequent sequence pattern electronic medical record similarity distribution
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《东北大学学报(自然科学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏