共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
1 .INTRODUCTIONBecause an agent’s rewardis a function of all agents’joint action, when applying RL[1]to multi-agent do-mains ,some fundamental change should be made .Byadopting single agentQlearning[2]to Markovgames,several algorithms have been proposed,suchas Littman’s mini maxQ-learning( mini max-Q)[3],Hu et al’s NashQ-learning(Nash-Q)[4 ,5], Claus etal’s cooperative multi-agentQ-learning[6], Bowlinget al’s multi-agent learningQ-learning using a vari-able learning rate[7 ~9],… 相似文献
3.
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effect... 相似文献
4.
基于强化学习的倒立摆起摆与平衡全过程控制 总被引:4,自引:0,他引:4
倒立摆的控制是一种典型的非线性控制问题。本文的目标是在假设不知道任何倒立摆模型的前提下,采用强化学习控制器实现倒立摆的起摆和平衡的全过程控制。为提高学习效率,采用了任务分解的方法,将整个控制任务分解为起摆和平衡两个子任务,对于不同的子任务根据其特点采用不同的强化学习算法。在Matlab/Simulink上进行仿真实验,结果证明,该方法在合理的时间内可以学习到成功的控制方法。 相似文献
5.
移动安全Agent扫描各客户主机的漏洞,采集记录异常活动的审计日志,实现事前和事后的安全保障,但移动Agent自身的通信和迁移的安全性同样重要.首先结合硬件特征属性密钥和用户信息,实现基于Agent技术的多因素认证系统,在认证基础上,利用非对称加密技术和密钥管理,保障Agent通信和迁移的安全性.Agent作为软件,容易受到外部破坏,采用检测代理,通过Agent的协作,利用地址解析协议对网内节点的扫描,将广域网扫描机制转化为简单易行的内网扫描,从而保障客户主机中认证Agent的部署可靠性.实验结果表明,该系统效率高,可扩展性、通用性好. 相似文献
6.
基于多Agent系统的计算机生成兵力建模研究 总被引:1,自引:0,他引:1
在计算机生成兵力(computer generated forces,CGF)的研究中,引入了多Agent系统(multi-agentsystems,MAS)理论,并以面向对象Petri网(object-oriented Petri nets,OPN)为基础,建立了一种通用的适合CGF的MAS形式化模型ArmyMAS.ArmyMAS描述了作战实体Agent、管理Agent和配置等三个单元,形象地刻画了CGF的结构与行为特性,同时可以利用Petri网的相关分析方法和工具对模型进行分析和验证.最后利用Ar-myMAS对弹道导弹攻防对抗CGF系统进行建模和分析,验证了该模型的有效性. 相似文献
7.
This paper proposes a multi-layer multi-agent model for the performance evaluation of power systems, which is different from
the existing multi-agent ones. To describe the impact of the structure of the networked power system, the proposed model consists
of three kinds of agents that form three layers: control agents such as the generators and associated controllers, information
agents to exchange the information based on the wide area measurement system (WAMS) or transmit control signals to the power
system stabilizers (PSSs), and network-node agents such as the generation nodes and load nodes connected with transmission
lines. An optimal index is presented to evaluate the performance of damping controllers to the system's inter-area oscillation
with respect to the information-layer topology. Then, the authors show that the inter-area information exchange is more powerful
than the exchange within a given area to control the inter-area low frequency oscillation based on simulation analysis.
This work was supported in part by the National Natural Science Foundation of China under Grants Nos. 50707035, 50595411,
60425307, 60221301, and 50607005, in part by the 111 project (B08013) and Program for Changjiang Scholars and Innovative Research
Team in University (IRT0515) and in part by the Program for New Century Excellent Talents in University (NCET-05-0216). 相似文献
8.
ZHAN Guang;ZHANG Kun;LI Ke;PIAO Haiyin 《系统工程与电子技术(英文版)》2024,(3):644-665
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy. 相似文献
9.
Fang Min 《系统工程与电子技术(英文版)》2008,19(2):377-380
Because most ensemble learning algorithms use the centralized model, and the training instances must be centralized on a single station, it is difficult to centralize the training data on a station. A distributed ensemble learning algorithm is proposed which has two kinds of weight genes of instances that denote the global distribution and the local distribution. Instead of the repeated sampling method in the standard ensemble learning, non-balance sampling from each station is used to train the base classifier set of each station. The concept of the effective nearby region for local integration classifier is proposed, and is used for the dynamic integration method of multiple classifiers in distributed environment. The experiments show that the ensemble learning algorithm in distributed environment proposed could reduce the time of training the base classifiers effectively, and ensure the classify performance is as same as the centralized learning method. 相似文献
10.
提出一种多智能体学习算法.用影响图作为 agent 表示工具,给定 agent 的一个初始模型和它的历史行为,在能力、信念和优先学习的基础上来构建新的模型.学习方法是把其它 agent 的历史行为作为训练集,利用神经网络以及决策知识和专家知识来修改影响图中各结点的连接关系.针对与 agent 历史行为不一致的情况,本文把它看成效用函数发生了随机偏差,通过 Markov chain-Monte Carlo 技术进行模拟,实现效用函数的调整.最后利用多机编队协同空战作为例子说明算法的实用性. 相似文献
11.
面向伙伴选择的模糊Markov博弈控制及仿真研究 总被引:1,自引:0,他引:1
针对不确定条件下的伙伴选择决策问题,把自适应模糊控制系统理论及神经网络理论引入到Markov博弈中,提出一种基于多智能体的伙伴选择模糊控制模型。该模型引入基于ANFIS和神经网络的模糊神经网络,实现了一种全新的进行值函数逼近的梯度下降Q学习的算法。并应用该模型对伙伴选择问题进行研究,对多影响因素进行FNN学习,将输出量作为标准Markov博弈模型的输入量,得到影响的策略,最后研究了一个应用实例,利用具体历史数据对建模方法和模型进行了验证和分析。 相似文献
12.
A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning.Through introducing multiple kernel learning into the SVM incremental learning,large scale data set learning problem can be solved effectively.Furthermore,different punishments are adopted in allusion to the training subset and the acquired support vectors,which may help to improve the performance of SVM.Simulation results indicate that the proposed algorithm can not only solve the model selection problem in SVM incremental learning,but also improve the classification or prediction precision. 相似文献
13.
考虑用迭代学习控制方法来解决一类线性时变连续系统的终端控制问题。运用ShiftedLegendre正交多项式的展开技术,利用其正交性和边值条件,将线性时变系统的微分方程转化为代数方程,避免了在判断误差收敛条件的过程中求解线性时变系统状态转移矩阵。并采用高阶学习律来求控制输入的ShiftedLegendre系数向量,仿真实例验证了该方法的有效性。 相似文献
14.
In order to detect and estimate faults in discrete lin-ear time-varying uncertain systems, the discrete iterative learning strategy is applied in fault diagnosis, and a novel fault detection and estimation algorithm is proposed. And the threshold limited technology is adopted in the proposed algorithm. Within the chosen optimal time region, residual signals are used in the proposed algorithm to correct the introduced virtual faults with iterative learning rules, making the virtual faults close to these occurred in practical systems. And the same method is repeated in the rest optimal time regions, thereby reaching the aim of fault diagnosis. The proposed algorithm not only completes fault detection and estimation for discrete linear time-varying uncertain systems, but also improves the reliability of fault detection and decreases the false alarm rate. The final simulation results verify the validity of the proposed algorithm. 相似文献
15.
基于迭代学习控制的PID控制器设计 总被引:4,自引:0,他引:4
针对传统的经验PID整定方法,提出了一种新的PID参数整定算法。该算法首先利用PD型迭代学习控制来进行期望轨迹的跟踪控制,然后根据迭代学习控制的输入输出数据序列,通过强跟踪滤波器来进行参数辨识,可获得对应于期望轨迹的优化的PID控制参数。给出了迭代学习控制的收敛条件,以及如何利用强跟踪滤波器来进行参数辨识。仿真和实验结果表明,采用该算法设计PID控制器,被控系统可以获得较佳的动态性能和较强的鲁棒性。 相似文献
16.
在分析了Kohonen自组织特征映射网络(SOFM)和学习矢量量化(LVQ)算法的基础上,提出一种基于改进的SOFM算法和LVQ2算法的混合学习矢量量化(HLVQ)方法,并建立了基于HLVQ的遥感影像非监督和监督分类的一般模型。通过与传统的统计分类方法和LVQ2网络分类器比较,HLVQ分类器总的分类性能更好、识别率更高。 相似文献
17.
Immune multi-agent model using vaccine for cooperative air-defense system of systems for surface warship formation based on danger theory
下载免费PDF全文

Aiming at the problem on cooperative air-defense of surface warship formation, this paper maps the cooperative airdefense system of systems (SoS) for surface warship formation (CASoSSWF) to the biological immune system (BIS) according to the similarity of the defense mechanism and characteristics between the CASoSSWF and the BIS, and then designs the models of components and the architecture for a monitoring agent, a regulating agent, a killer agent, a pre-warning agent and a communicating agent by making use of the theories and methods of the artificial immune system, the multi-agent system (MAS), the vaccine and the danger theory (DT). Moreover a new immune multi-agent model using vaccine based on DT (IMMUVBDT) for the cooperative air-defense SoS is advanced. The immune response and immune mechanism of the CASoSSWF are analyzed. The model has a capability of memory, evolution, commendable dynamic environment adaptability and self-learning, and embodies adequately the cooperative air-defense mechanism for the CASoSSWF. Therefore it shows a novel idea for the CASoSSWF which can provide conception models for a surface warship formation operation simulation system. 相似文献
18.
19.
基于泛函网络的多维函数逼近理论及学习算法 总被引:7,自引:1,他引:7
提出一种多维函数逼近的泛函网络逼近方法,设计了一类用于函数逼近的可分离泛函网络,给出了基于泛函网络的函数逼近学习算法。而泛函网络的参数通过解方程组得到,它们能逼近给定函数到预定的精度。仿真结果表明,这种逼近方法简单可行,具有较快的收敛速度和良好的逼近性能。 相似文献