首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于机器学习的高血压病历文本分类
引用本文:胡婧,刘伟,马凯.基于机器学习的高血压病历文本分类[J].科学技术与工程,2019,19(33):296-301.
作者姓名:胡婧  刘伟  马凯
作者单位:徐州医科大学医学信息学院,徐州,221004
基金项目:国家自然科学基金项目(81471330)、江苏省教育厅高等教育研究课题(2015JSJG261)、江苏省大学生创新创业项目(201810313047Y)资助
摘    要:为了探讨中文病历文本预处理后高维稀疏性的特点,导致文本分类精度低、算法模型收敛速度慢等性能问题,提出一种基于粗糙集的词袋(BOW)模型结合支持向量机(SVM)的文本分类算法(BOW+SVM)。该算法首先采用BOW模型对特征词提取构建高维度文本空间向量,然后利用粗糙集的属性约简算法对文本特征处理,把模糊的、冗余的属性从决策规则中清除,降低空间向量维数,最后利用所提纯的特征与SVM分类器交叉结合进行文本分类。在Python+TensorFlow环境中设计六种交叉结合的算法仿真对比实验,结果表明:基于BOW+SVM高血压病历文本分类模型精准度可达97%。可见改进后的模型,能够解决样本分部不均,克服高维度稀疏特征空间的问题,有效改善病案管理工作流程。

关 键 词:文本分类  自然语言处理  粗糙集  词袋模型  支持向量机
收稿时间:2019/4/10 0:00:00
修稿时间:2019/6/14 0:00:00

Classification of Hypertensive Medical Records based on Machine Learning
Hu Jing,Liu Wei and.Classification of Hypertensive Medical Records based on Machine Learning[J].Science Technology and Engineering,2019,19(33):296-301.
Authors:Hu Jing  Liu Wei and
Institution:Xuzhou Medical University School of Medical Information,Xuzhou Medical University School of Medical Information,
Abstract:In order to explore the characteristics of high-dimensional sparsity of Chinese medical record text preprocessing, resulting in low performance accuracy of text classification and slow convergence of algorithm model, a text classification algorithm based on rough set BOW model combined with SVM is proposed(BOW+SVM). Firstly, the BOW model is used to construct high-dimensional text space vector for feature word extraction. Then, the attribute reduction algorithm of rough set is used to process the text feature, and the fuzzy and redundant attributes are removed from the decision rule to reduce the space vector dimension. Finally, the refined features are cross-combined with the SVM classifier for text classification. In the Python + TensorFlow environment, six kinds of cross-combined algorithm simulation experiments were designed. The results show that the accuracy of the text classification model based on BOW+SVM hypertension can reach 97%. It is seen that the improved model can solve the problem of uneven sample division, overcome the problem of high-dimensional sparse feature space, and effectively improve the workflow of medical record management.
Keywords:text categorization    natural language processing    rough set    bag of words model    support vector machine
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号