首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
针对一种大地图和稀疏奖励的兵棋推演对抗环境下,单纯的深度强化学习算法会导致训练无法快速收敛以及智能体对抗特定规则智能体胜率较低的问题,提出了一种基于监督学习和深度强化学习相结合以及设置额外奖励的方法,旨在提升智能博弈的训练效果。使用监督学习训练智能体;研究基于近端策略优化(Proximal policy optimization,PPO)的对抗算法;改进强化学习训练过程的额外奖励设置。以某在研兵棋推演环境为例的实验结果表明,该博弈对抗算法能使智能体在对抗其他智能体时的胜率稳步提升并在较短时间内达到收敛。  相似文献   

2.
针对主动配电网电压优化控制中模型不确定性和通信代价大的问题,提出了一种基于灵敏度矩阵安全的多智能体深度强化学习(SMS-MADRL)算法。该算法利用安全深度强化学习,应对主动配电网的固有不确定性,并采用多智能体结构实现通信代价较小的分布式控制。首先,将电压优化控制问题描述为受约束的马尔可夫博弈(CMG);然后,对无功功率进行适当修改,通过分析节点电压的变化得到灵敏度矩阵,进而与主动配电网环境进行交互,训练出若干可以独立给出最优无功功率指令的智能体。与现有多智能体深度强化学习算法相比,该算法的优点在于给智能体的动作网络增添了基于灵敏度矩阵的安全层,在智能体的训练和执行阶段保证了主动配电网的电压安全性。在IEEE 33节点系统上的仿真结果表明:所提出的算法不仅能够满足电压约束,而且相较于多智能体深度确定性策略梯度(MADDPG)算法,网络损耗减少了4.18%,控制代价减少了70.5%。该研究可为主动配电网的电压优化控制提供理论基础。  相似文献   

3.
用于高维函数优化的多智能体量子进化算法   总被引:1,自引:0,他引:1  
基于智能体的竞争和学习能力、量子计算理论及生物进化策略,提出了一种新的优化方法——多智能体量子进化算法.一个智能体代表优化问题的一个可能解,所有的智能体都以量子染色体表示.该算法将智能体分布于多智能体网络环境中,智能体之间通过量子进化来实现竞争及学习,以提高个体的竞争能力.理论证明该算法具有全局收敛性.实验结果表明,该算法具有强的全局寻优能力及快速搜索能力。  相似文献   

4.
传统的深度强化学习算法在解决任务时与环境交互量大且样本复杂度高,导致智能体的训练时间长,算法难以收敛,故在实际问题中的应用受限.针对该问题,在智能体采用梯度下降方法更新模型参数的过程中融入元学习思想,提出一种改进的深度强化学习算法,使得智能体利用在训练任务中学习到的先验知识快速地适应新任务.仿真结果表明:改进的深度强化学习算法可实现智能体在新任务上的快速适应,其收敛速度和稳定性等均优于传统算法.  相似文献   

5.
多目标控制参数联合优化整定是自动化系统保持高效、稳定运行的关键问题,强化学习常用于建立自动化调参智能体,代替人工完成参数整定. 针对现有方法使用固定权重将多个优化目标线性组合为单目标,训练具有固定调参知识的单智能体模型,导致实际目标关系受环境影响与先验不符时,智能体无法感知并做出适应性决策调整,限制参数整定效果的问题,提出一种面向多目标参数整定的协同深度强化学习方法. 该方法利用离线仿真学习目标整定知识建立多个Double-DQN智能体,在线建立整定效果反馈,感知目标实际关系并调整智能体协同策略,实现有效的多目标参数整定. 列车自动驾驶参数整定实验结果表明,方法对停车误差、舒适度两个目标整定效果良好,能自适应不同车轨性能且可持续优化,实用价值大.   相似文献   

6.
基于智能体 (Agent)系统强化学习原理和基于动态规划的Q -学习算法的基础上 ,提出了一种新的Agent强化学习算法 .该算法在Agent学习过程中不断调整Agent知识库的加权值 ,在强化学习的每个阶段 ,通过选取合适的信度分配函数来修正Agent强化学习动作的选取策略 .与标准的Q -学习方法相比 ,具有更加合理的物理结构 ,并且能保证算法收敛 .仿真实验说明该方法加快了标准Q -学习算法的收敛速度 ,具有较好的学习性能  相似文献   

7.
以机器人足球比赛(RoboCup)为背景,基于主智能体和辅助智能体概念,提出了基于主智能体群体强化学习算法(GLBMA),该算法通过主智能体和辅智能体的角色切换来实现整个团队的学习,改进了传统的群体强化学习算法。RoboCup仿真比赛试验表明,传统群体强化学习算法中的行为学习状态空间过大,连续状态空间的行为选择及多智能体合作求解等问题得到了解决.  相似文献   

8.
针对不确定环境的规划问题,提出了基于预测状态表示的Q学习算法.将预测状态表示方法与Q学习算法结合,用预测状态表示的预测向量作为Q学习算法的状态表示,使得到的状态具有马尔可夫特性,满足强化学习任务的要求,进而用Q学习算法学习智能体的最优策略,可解决不确定环境下的规划问题.仿真结果表明,在发现智能体的最优近似策略时,算法需要的学习周期数与假定环境状态已知情况下需要的学习周期数大致相同.  相似文献   

9.
汽车自动变道需要在保证不发生碰撞的情况下,以尽可能快的速度行驶,规则性地控制不仅对意外情况不具有鲁棒性,而且不能对间隔车道的情况做出反应.针对这些问题,提出了一种基于双决斗深度Q网络(dueling double deep Q-network, D3QN)强化学习模型的自动换道决策模型,该算法对车联网反馈的环境车信息处理之后,通过策略得到动作,执行动作后根据奖励函数对神经网络进行训练,最后通过训练的网络以及强化学习来实现自动换道策略.利用Python搭建的三车道环境以及车辆仿真软件CarMaker进行仿真实验,得到了很好的控制效果,结果验证了本文算法的可行性和有效性.  相似文献   

10.
综合分析了影响城市公共交通系统运行的多种因素,提出了一种新型的基于强化学习算法的城市公交信号优先控制策略.该策略利用强化学习算法的试错-改进机制,根据不同交通环境下信号控制策略实施后反馈的结果,迭代优化路口的公交信号优先控制策略,从而使其具备了自学习的能力.基于Paramics的仿真实验表明,该算法能够在保障路口正常交通秩序的同时,显著提高公交车运行效率.  相似文献   

11.
The discovery of the prolific Ordovician Red River reservoirs in 1995 in southeastern Saskatchewan was the catalyst for extensive exploration activity which resulted in the discovery of more than 15 new Red River pools. The best yields of Red River production to date have been from dolomite reservoirs. Understanding the processes of dolomitization is, therefore, crucial for the prediction of the connectivity, spatial distribution and heterogeneity of dolomite reservoirs.The Red River reservoirs in the Midale area consist of 3~4 thin dolomitized zones, with a total thickness of about 20 m, which occur at the top of the Yeoman Formation. Two types of replacement dolomite were recognized in the Red River reservoir: dolomitized burrow infills and dolomitized host matrix. The spatial distribution of dolomite suggests that burrowing organisms played an important role in facilitating the fluid flow in the backfilled sediments. This resulted in penecontemporaneous dolomitization of burrow infills by normal seawater. The dolomite in the host matrix is interpreted as having occurred at shallow burial by evaporitic seawater during precipitation of Lake Almar anhydrite that immediately overlies the Yeoman Formation. However, the low δ18O values of dolomited burrow infills (-5.9‰~ -7.8‰, PDB) and matrix dolomites (-6.6‰~ -8.1‰, avg. -7.4‰ PDB) compared to the estimated values for the late Ordovician marine dolomite could be attributed to modification and alteration of dolomite at higher temperatures during deeper burial, which could also be responsible for its 87Sr/86Sr ratios (0.7084~0.7088) that are higher than suggested for the late Ordovician seawaters (0.7078~0.7080). The trace amounts of saddle dolomite cement in the Red River carbonates are probably related to "cannibalization" of earlier replacement dolomite during the chemical compaction.  相似文献   

12.
There are numerous geometric objects stored in the spatial databases. An importance function in a spatial database is that users can browse the geometric objects as a map efficiently. Thus the spatial database should display the geometric objects users concern about swiftly onto the display window. This process includes two operations:retrieve data from database and then draw them onto screen. Accordingly, to improve the efficiency, we should try to reduce time of both retrieving object and displaying them. The former can be achieved with the aid of spatial index such as R-tree, the latter require to simplify the objects. Simplification means that objects are shown with sufficient but not with unnecessary detail which depend on the scale of browse. So the major problem is how to retrieve data at different detail level efficiently. This paper introduces the implementation of a multi-scale index in the spatial database SISP (Spatial Information Shared Platform) which is generalized from R-tree. The difference between the generalization and the R-tree lies on two facets: One is that every node and geometric object in the generalization is assigned with a importance value which denote the importance of them, and every vertex in the objects are assigned with a importance value,too. The importance value can be use to decide which data should be retrieve from disk in a query. The other difference is that geometric objects in the generalization are divided into one or more sub-blocks, and vertexes are total ordered by their importance value. With the help of the generalized R-tree, one can easily retrieve data at different detail levels.Some experiments are performed on real-life data to evaluate the performance of solutions that separately use normal spatial index and multi-scale spatial index. The results show that the solution using multi-scale index in SISP is satisfying.  相似文献   

13.
AcomputergeneratorforrandomlylayeredstructuresYUJia shun1,2,HEZhen hua2(1.TheInstituteofGeologicalandNuclearSciences,NewZealand;2.StateKeyLaboratoryofOilandGasReservoirGeologyandExploitation,ChengduUniversityofTechnology,China)Abstract:Analgorithmisintrod…  相似文献   

14.
本文叙述了对海南岛及其毗邻大陆边缘白垩纪到第四纪地层岩石进行古地磁研究的全部工作过程。通过分析岩石中剩余磁矢量的磁偏角及磁倾角的变化,提出海南岛白垩纪以来经历的构造演化模式如下:早期伴随顺时针旋转而向南迁移,后期伴随逆时针转动并向北运移。联系该地区及邻区的地质、地球物理资料,对海南岛上述的构造地体运动提出以下认识:北部湾内早期有一拉张作用,主要是该作用使湾内地壳显著伸长减薄,形成北部湾盆地。从而导致了海南岛的早期构造运动,而海南岛后期的构造运动则主要是受南海海底扩张的影响。海南地体运动规律的阐明对于了解北部湾油气盆地的形成演化有重要的理论和实际意义。  相似文献   

15.
Various applications relevant to the exciton dynamics,such as the organic solar cell,the large-area organic light-emitting diodes and the thermoelectricity,are operating under temperature gradient.The potential abnormal behavior of the exicton dynamics driven by the temperature difference may affect the efficiency and performance of the corresponding devices.In the above situations,the exciton dynamics under temperature difference is mixed with  相似文献   

16.
The elongation method,originally proposed by Imamura was further developed for many years in our group.As a method towards O(N)with high efficiency and high accuracy for any dimensional systems.This treatment designed for one-dimensional(ID)polymers is now available for three-dimensional(3D)systems,but geometry optimization is now possible only for 1D-systems.As an approach toward post-Hartree-Fock,it was also extended to  相似文献   

17.
18.
The explosive growth of the Internet and database applications has driven database to be more scalable and available, and able to support on-line scaling without interrupting service. To support more client's queries without downtime and degrading the response time, more nodes have to be scaled up while the database is running. This paper presents the overview of scalable and available database that satisfies the above characteristics. And we propose a novel on-line scaling method. Our method improves the existing on-line scaling method for fast response time and higher throughputs. Our proposed method reduces unnecessary network use, i.e. , we decrease the number of data copy by reusing the backup data. Also, our on-line scaling operation can be processed parallel by selecting adequate nodes as new node. Our performance study shows that our method results in significant reduction in data copy time.  相似文献   

19.
R-Tree is a good structure for spatial searching. But in this indexing structure,either the sequence of nodes in the same level or sequence of traveling these nodes when queries are made is random. Since the possibility that the object appears in different MBR which have the same parents node is different, if we make the subnode who has the most possibility be traveled first, the time cost will be decreased in most of the cases. In some case, the possibility of a point belong to a rectangle will shows direct proportion with the size of the rectangle. But this conclusion is based on an assumption that the objects are symmetrically distributing in the area and this assumption is not always coming into existence. Now we found a more direct parameter to scale the possibility and made a little change on the structure of R-tree, to increase the possibility of founding the satisfying answer in the front sub trees. We names this structure probability based arranged R-tree (PBAR-tree).  相似文献   

20.
The geographic information service is enabled by the advancements in general Web service technology and the focused efforts of the OGC in defining XML-based Web GIS service. Based on these models, this paper addresses the issue of services chaining,the process of combining or pipelining results from several interoperable GIS Web Services to create a customized solution. This paper presents a mediated chaining architecture in which a specific service takes responsibility for performing the process that describes a service chain. We designed the Spatial Information Process Language (SIPL) for dynamic modeling and describing the service chain, also a prototype of the Spatial Information Process Execution Engine (SIPEE) is implemented for executing processes written in SIPL. Discussion of measures to improve the functionality and performance of such system will be included.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号