基于强化学习的异步动态定价算法 Asynchronous dynamic pricing algorithms based on reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于强化学习的异步动态定价算法

引用本文：	王金田,唐昊,程文娟,毕翔. 基于强化学习的异步动态定价算法[J]. 系统工程学报, 2011, 26(5)

作者姓名：	王金田唐昊程文娟毕翔

作者单位：	1. 合肥工业大学计算机与信息学院,安徽合肥230009;安徽省审计厅,安徽合肥230001 2. 合肥工业大学计算机与信息学院,安徽合肥230009;教育部安全关键工业测控技术教育部工程研究中心,安徽合肥230009 3. 合肥工业大学计算机与信息学院,安徽合肥,230009

基金项目：	教育部留学回国人员科研启动基金资助项目，安徽省自然科学基会资助项目，安徽高校省级自然科学研究重点资助项目

摘要：	研究电子零售市场上两个销售商在彼此没有信息交互情况下的异步动态定价问题.基于性能势理论,建立了同时适用于平均和折扣两种优化准则下的异步定价策略的Q学习和WoLF-PHC算法,通过一个数值例子比较了相关算法的学习优化效果.仿真结果表明,Q学习和WoLF-PHC算法都能较好地解决异步动态定价问题,但由于后者采用混合策略和可变学习率,故能更好地适应环境变化,并具有更好的学习优化效果.
关键词：	异步动态定价多Agent 性能势 WoLF-PHC算法
Asynchronous dynamic pricing algorithms based on reinforcement learning

WANG Jin-tian,TANG Hao,CHENG Wen-juan,BI Xiang. Asynchronous dynamic pricing algorithms based on reinforcement learning[J]. Journal of Systems Engineering, 2011, 26(5)

Authors:	WANG Jin-tian TANG Hao CHENG Wen-juan BI Xiang

Abstract:	This paper studies the asynchronous dynamic pricing problems of two sellers in an electronic retail market without information exchange between them.Based on the concept of performance potential,aQ-learning algorithm and a WoLF-PHC algorithm are proposed to yield the asynchronous pricing policies that are suitable for either average-or discounted-reward criteria.A numerical example is used to compare the learning performance of different algorithms.The simulation results show that both the proposed algorith...

Keywords:	asynchronous dynamic pricing multi-agent performance potential WoLF-PHC algorithm
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏