基于机器学习的高血压病历文本分类 Classification of Hypertensive Medical Records based on Machine Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于机器学习的高血压病历文本分类

引用本文：	胡婧,刘伟,马凯.基于机器学习的高血压病历文本分类[J].科学技术与工程,2019,19(33):296-301.

作者姓名：	胡婧刘伟马凯

作者单位：	徐州医科大学医学信息学院,徐州,221004

基金项目：	国家自然科学基金项目(81471330)、江苏省教育厅高等教育研究课题(2015JSJG261)、江苏省大学生创新创业项目(201810313047Y)资助

摘要：	为了探讨中文病历文本预处理后高维稀疏性的特点,导致文本分类精度低、算法模型收敛速度慢等性能问题,提出一种基于粗糙集的词袋(BOW)模型结合支持向量机(SVM)的文本分类算法(BOW+SVM)。该算法首先采用BOW模型对特征词提取构建高维度文本空间向量,然后利用粗糙集的属性约简算法对文本特征处理,把模糊的、冗余的属性从决策规则中清除,降低空间向量维数,最后利用所提纯的特征与SVM分类器交叉结合进行文本分类。在Python+TensorFlow环境中设计六种交叉结合的算法仿真对比实验,结果表明:基于BOW+SVM高血压病历文本分类模型精准度可达97%。可见改进后的模型,能够解决样本分部不均,克服高维度稀疏特征空间的问题,有效改善病案管理工作流程。
关键词：	文本分类自然语言处理粗糙集词袋模型支持向量机
收稿时间：	2019/4/10 0:00:00
修稿时间：	2019/6/14 0:00:00
Classification of Hypertensive Medical Records based on Machine Learning

Hu Jing,Liu Wei and.Classification of Hypertensive Medical Records based on Machine Learning[J].Science Technology and Engineering,2019,19(33):296-301.

Authors:	Hu Jing Liu Wei and

Institution:	Xuzhou Medical University School of Medical Information,Xuzhou Medical University School of Medical Information,

Abstract:	In order to explore the characteristics of high-dimensional sparsity of Chinese medical record text preprocessing, resulting in low performance accuracy of text classification and slow convergence of algorithm model, a text classification algorithm based on rough set BOW model combined with SVM is proposed(BOW+SVM). Firstly, the BOW model is used to construct high-dimensional text space vector for feature word extraction. Then, the attribute reduction algorithm of rough set is used to process the text feature, and the fuzzy and redundant attributes are removed from the decision rule to reduce the space vector dimension. Finally, the refined features are cross-combined with the SVM classifier for text classification. In the Python + TensorFlow environment, six kinds of cross-combined algorithm simulation experiments were designed. The results show that the accuracy of the text classification model based on BOW+SVM hypertension can reach 97%. It is seen that the improved model can solve the problem of uneven sample division, overcome the problem of high-dimensional sparse feature space, and effectively improve the workflow of medical record management.

Keywords:	text categorization natural language processing rough set bag of words model support vector machine
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《科学技术与工程》浏览原始摘要信息
	点击此处可从《科学技术与工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏