首页 | 本学科首页   官方微博 | 高级检索  
     

K-VQA:一种知识图谱辅助下的视觉问答方法
引用本文:高鸿斌,毛金莹,王会勇. K-VQA:一种知识图谱辅助下的视觉问答方法[J]. 河北科技大学学报, 2020, 41(4): 315-326. DOI: 10.7535/hbkd.2020yx04004
作者姓名:高鸿斌  毛金莹  王会勇
作者单位:河北科技大学信息科学与工程学院,河北石家庄 050018,河北科技大学信息科学与工程学院,河北石家庄 050018,河北科技大学信息科学与工程学院,河北石家庄 050018
基金项目:河北省自然科学基金(F2018208116)
摘    要:依照所回答的问题类型区分,图像和文本的视觉问答大体分为2类,第1类是可以从图像中直接获取答案的问题,第2类是需借助外部知识获取答案的问题。目前的视觉问答方法只能在一类问题上具有较高的准确率,回答另一类问题的技术尚不成熟。为了扩大可回答的问题类型,设计了一种知识图谱辅助下的视觉问答方法——K-VQA。在基于深度学习VQA的基础上,通过查询知识图谱区分问题类型,对不同类型的问题采用最合适的方法进行回答,对于需借助外部知识进行回答的问题,利用图像和问题中的信息判断回答问题所需的实体和属性,抽取知识图谱中的三元组,获取问题答案。结果表明,不同的视觉问答技术适用于不同类型的问题,K-VQA方法既能回答简单问题也能回答推理性问题,准确率高达56.67%。因此,作为知识图谱辅助下的视觉问答方法,K-VQA可以回答更多类型的问题并获得较高的准确率,对于深入研究VQA和VQA方法具有重要的参考价值。

关 键 词:知识工程  视觉问答  外部知识  知识图谱  三元组
收稿时间:2020-06-16
修稿时间:2020-07-20

K-VQA: A visual question answering method
GAO Hongbin,MAO Jinying,WANG Huiyong. K-VQA: A visual question answering method[J]. Journal of Hebei University of Science and Technology, 2020, 41(4): 315-326. DOI: 10.7535/hbkd.2020yx04004
Authors:GAO Hongbin  MAO Jinying  WANG Huiyong
Abstract:The types of questions answered by the visual question answering of images and texts are roughly divided into two types. The first type is the questions that can get the answers directly from the images, and the second type is the questions that need the help of external knowledge to obtain the answers. The current visual question answering method only has a high accuracy in one kind of questions, but the technology to answer the second kind of questions is not yet mature. In order to expand the types of questions that can be answered, a visual question answering method- K-VQA was designed with the help of knowledge graph. On the basis of deep learning VQA, the types of questions are distinguished by querying the knowledge graph, so that different types of questions can be answered with the most appropriate method. For the questions that need to be answered with external knowledge, the images and information in the questions are used to determine the entities and attributes required to answer the questions, and the triples in the knowledge graph are extracted to obtain the answers to the questions. The results show that different visual question answering techniques are suitable for different types of questions. The K-VQA method can answer both simple questions and reasoning questions with an accuracy of 5667%. Therefore, as a visual question answering method assisted by knowledge graph, K-VQA can answer more types of questions and obtain higher accuracy, which has important reference value for further study of VQA and VQA methods..
Keywords:knowledge engineering   visual question answering   external knowledge   knowledge graph   triple
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号