首页 | 本学科首页   官方微博 | 高级检索  
     检索      

在线评论中基于边界平均信息熵的产品特征提取算法
引用本文:刘通,张聪,吴鸣远.在线评论中基于边界平均信息熵的产品特征提取算法[J].系统工程理论与实践,2016,36(9):2416-2423.
作者姓名:刘通  张聪  吴鸣远
作者单位:1. 上海交通大学 安泰经济与管理学院, 上海 200030;2. 上海交通大学 电子信息与电气工程学院, 上海 200030
摘    要:随着电子商务业务的迅猛发展,基于用户网上评论的文本研究也成为热点课题.用户在进行购买决策时,不仅需要了解该商品的整体评价,同时需要知道商品各个特征的情感态度倾向,故文章的目的在于研究在线评论中产品特征的自动提取的问题.实验选择满足BNP(base noun phrase)模式的N-Gram作为候选项,并利用N-Gram的边界平均信息熵的指标以及子串依赖关系对候选项进行过滤,提取最终的产品特征.与仅采取BNP模式直接作为产品特征的参照条件相比,当前方法选取的过滤条件可以有效提高产品特征提取的准确率.文中的方法不依赖于外部的领域语料且不需进行人工干预,其最终输出的结果具有子串依赖的层次性,可以作为领域知识构建的有效的参考数据结构.

关 键 词:在线评论  产品特征  边界平均信息熵  
收稿时间:2015-06-23

An algorithm of online product feature extraction based on boundary average entropy
LIU Tong,ZHANG Cong,WU Mingyuan.An algorithm of online product feature extraction based on boundary average entropy[J].Systems Engineering —Theory & Practice,2016,36(9):2416-2423.
Authors:LIU Tong  ZHANG Cong  WU Mingyuan
Institution:1. Antai College of Economics & Management, Shanghai Jiao Tong University, Shanghai 200030, China;2. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
Abstract:With the rapid development of e-commerce business, the research of text mining with online reviews has become a prevalence topic. While an end-user is making a purchasing decision, he is not only interested in whether the product is recommended, he also cares about the sentiment orientation corresponds to the product's detailed features. So this paper aims to solve the problem of automatically extracting the products features of the online reviews. In his paper, we choose the N-Grams that are in the pattern of BNP (base noun phrase) as candidate feature items. Additionally, we take advantage of the boundary average entropy of N-Grams and the substring dependency relationships among the items to filter the result. Referring to the final experiment outcomes, we conclude that the current filtering condition improves the accuracy of the result comparing with the baseline method, which directly designate the BNP as feature items. The current method does not rely on the outside domain corpus for training and is free from manual intervention. Also, one more meaningful aspect of the research is that the output result is in a hierarchical presentation of tree form and it will be beneficial for the further research on the construction of domain knowledge ontology as a nice reference data structure.
Keywords:online reviews  product feature  boundary average entropy
本文献已被 CNKI 等数据库收录!
点击此处可从《系统工程理论与实践》浏览原始摘要信息
点击此处可从《系统工程理论与实践》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号