首页 | 本学科首页   官方微博 | 高级检索  
     

基于监督学习的HSK阅读文本自动分级模型研究
引用本文:任梦,王方伟. 基于监督学习的HSK阅读文本自动分级模型研究[J]. 河北科技大学学报, 2024, 45(2): 150-158
作者姓名:任梦  王方伟
作者单位:河北师范大学文学院;河北师范大学计算机与网络空间安全学院
基金项目:国家自然科学基金(61572170);河北师范大学2023年度人文社会科学校内科研基金(S23AI001)
摘    要:针对HSK(汉语水平考试)各类阅读材料难度判定与等级对应中缺乏有效参照标准和分析工具的问题,以历年HSK真题阅读文本为研究对象,提取文本可读性特征,采用支持向量机、随机森林、极端梯度增强等9种监督学习算法,建立可将自选文本自动归类于相应HSK等级的模型,采用准确率、AUC等多项指标评价各模型的分级效果,并选择最佳模型制成在线工具。结果表明,监督学习在HSK阅读材料文本分析及分级方面具有较高性能,9种模型中极端梯度增强的分级效果最好,准确率为0.913,AUC为0.994。建立的分级模型和在线工具能够以较高的准确率对HSK自选文本进行分级,帮助用户有针对性地遴选文本,提高学习效率。

关 键 词:自然语言处理  监督学习  HSK阅读文本  可读性特征  分级模型
收稿时间:2023-12-19
修稿时间:2024-03-06

Research on automatic grading model of HSK reading texts based on supervised learning
REN Meng,WANG Fangwei. Research on automatic grading model of HSK reading texts based on supervised learning[J]. Journal of Hebei University of Science and Technology, 2024, 45(2): 150-158
Authors:REN Meng  WANG Fangwei
Affiliation:College of Chinese and Literature,Hebei Normal University,Shijiazhuang; College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang
Abstract:Aiming at the problem that there are few effective reference standards and analysis tools available in classifying and grading Hanyu Shuiping Kaoshi(HSK) reading materials, with HSK reading texts in the past years as study object, the text readability features were extracted, and nine supervised learning algorithms, such as support vector machine, decision tree and extreme gradient enhancement, etc., were employed to build a model that could automatically classify self-selected text to the corresponding HSK level. Multiple indicators such as accuracy and AUC were adopted to evaluate the grading effect of each model, and the best model was chosen to design an online tool. The results show that supervised learning has high performance in analyzing and grading HSK reading materials. Among the nine supervised learning models, extreme gradient enhancement is the best, with an accuracy of 0913 and an AUC of 0994. The grading model and online tool can grade HSK self-selected texts with high accuracy, help users select texts pertinently and improve learning efficiency.
Keywords:natural language processing   supervised learning   HSK reading text   readability feature   grading model
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号